JP2003280696A

JP2003280696A - Voice enhancement device and voice enhancement method

Info

Publication number: JP2003280696A
Application number: JP2002077327A
Authority: JP
Inventors: Yoka O; 幼華王; Koji Yoshida; 幸司吉田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-03-19
Filing date: 2002-03-19
Publication date: 2003-10-02
Anticipated expiration: 2022-03-19
Also published as: JP3960834B2

Abstract

(57)【要約】【課題】音声の歪みが少なくかつ雑音を十分に除
去すること。【解決手段】第二コムフィルタ生成部１０９は、雑音
情報に影響されないように、第二音声/非音声識別部１
０７の閾値を高く設定して、各周波数成分における音声
成分の有無に基づいて音声ピッチ調波構造を修復する基
準コムフィルタとして生成する。ピッチ推定部１１１
は、周波数分割部１０４から出力された音声スペクトル
から音声ピッチ周期を推定し、推定結果をピッチ調波構
造修復部１１２に出力する。ピッチ調波構造修復部１１
２は、この推定結果と第二コムフィルタ生成部１０９の
結果に基づいてピッチ調波構造の修復を行い、その結果
をコムフィルタ修正部１１３に出力する。コムフィルタ
修正部１１３は、ピッチ調波構造修復部１１２から出力
された推定結果と第一コムフィルタ生成部１０８から出
力された結果を組み合せてコムフィルタの修正を行う。 (57) [Summary] [PROBLEMS] To reduce voice distortion and sufficiently remove noise. SOLUTION: A second comb filter generation unit 109 controls a second speech / non-speech discrimination unit 1 so as not to be affected by noise information.
A threshold value of 07 is set high, and a reference comb filter for restoring the voice pitch harmonic structure based on the presence or absence of the voice component in each frequency component is generated. Pitch estimation section 111
Estimates the speech pitch period from the speech spectrum output from the frequency division unit 104, and outputs the estimation result to the pitch harmonic structure restoration unit 112. Pitch harmonic structure restoration unit 11
2 restores the pitch harmonic structure based on the estimation result and the result of the second comb filter generation unit 109, and outputs the result to the comb filter correction unit 113. The comb filter correction unit 113 corrects the comb filter by combining the estimation result output from the pitch harmonic structure recovery unit 112 and the result output from the first comb filter generation unit 108.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声強調装置及び
音声強調方法に関し、特に通信に用いて好適な音声強調
装置及び音声強調方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice enhancement device and a voice enhancement method, and more particularly to a voice enhancement device and a voice enhancement method suitable for use in communication.

【０００２】[0002]

【従来の技術】従来の音声符号化通信において、低ビッ
トレートで音声を符号化して通信を行うと、背景雑音の
ない音声に対しては高品質な音声での通話を提供するこ
とができるが、背景雑音が含まれた音声に対しては低ビ
ットレート符号化特有の耳障りな歪みが生じ、音質が劣
化する。2. Description of the Related Art In conventional voice coded communication, when voice is coded at a low bit rate for communication, it is possible to provide a voice call with high quality for voice without background noise. , The sound with background noise causes annoying distortion peculiar to low bit rate coding, and the sound quality deteriorates.

【０００３】この音質が劣化する問題の対処するため雑
音抑圧処理を行うことがある。この雑音抑圧処理に関す
る音声強調技術としてスペクトルサブトラクション法、
およびコムフィルタ法がある。In order to deal with this problem of sound quality deterioration, noise suppression processing may be performed. The spectral subtraction method is used as a speech enhancement technology for this noise suppression processing.
And there is a comb filter method.

【０００４】スペクトルサブトラクション法（ＳＳ法）
は、雑音情報に着目して無音区間で雑音の性質を推定し
て雑音を含む音声信号の短時間パワスペクトルから雑音
の短時間パワスペクトルを減算する、または減衰係数を
乗算することにより音声信号のパワスペクトルを推定し
て雑音を抑圧する方法である。ＳＳ法は、例えば、文献
１（S.Boll,Suppression of acoustic noise in speech
using spectral subtraction,IEEE Trans.Acoustics,S
peech,and Signal Processing,vol.ASSP-27,pp.113-12
0,1979）、文献２(R.J.McAulay,M.L.Malpass,Speech en
hancement usinga soft-decision noise suppression f
ilter,IEEE.Trans.Acoustics,Speech,and Signal Proce
ssing,vol.ASSP-28,pp.137-145.1980)に記載されている
ものがある。Spectral subtraction method (SS method)
Pays attention to the noise information, estimates the nature of noise in a silent section, subtracts the short-time power spectrum of noise from the short-time power spectrum of the voice signal containing noise, or multiplies the attenuation coefficient by This is a method of estimating the power spectrum and suppressing noise. The SS method is described in, for example, Document 1 (S. Boll, Suppression of acoustic noise in speech.
using spectral subtraction, IEEE Trans.Acoustics, S
peech, and Signal Processing, vol.ASSP-27, pp.113-12
0,1979), reference 2 (RJMcAulay, MLMalpass, Speech en.
hancement using a soft-decision noise suppression f
ilter, IEEE.Trans.Acoustics, Speech, and Signal Proce
ssing, vol.ASSP-28, pp.137-145.1980).

【０００５】しかしながら、ＳＳ法では、音声からノイ
ズを差し引いた後の残留ノイズとして、音声ピッチ間の
残留ノイズに起因する歪を生じる。However, in the SS method, distortion resulting from residual noise between voice pitches occurs as residual noise after the noise is subtracted from the voice.

【０００６】例えば、音声強調方法として用いられる文
献１に示したＳＳ法では、ノイズ情報のみに着目し、短
時間のノイズ特性を定常と見なして、音声とノイズを区
別せず、一律にノイズベース（推定されたノイズのスペ
クトル特性）を差し引く。しかし、ＳＳ法では、音声の
情報（例えば、音声のピッチ）は利用していない。実際
の通信では、ノイズの特性が定常でないため、差し引か
れた後の残留ノイズ、特に音声ピッチ間の残留ノイズに
より、いわゆる「ミュジカルノイズ」と呼ばれる不自然
な歪を生じる。[0006] For example, in the SS method shown in Document 1 used as a speech enhancement method, only noise information is focused on, noise characteristics in a short time are regarded as stationary, and speech and noise are not distinguished, and a noise base is uniformly applied. Subtract the (estimated noise spectral characteristics). However, the SS method does not use voice information (for example, voice pitch). In actual communication, since the noise characteristic is not stationary, residual noise after the subtraction, particularly residual noise between voice pitches causes an unnatural distortion called "musical noise".

【０００７】この不自然な歪みの改善法として、音声パ
ワ対ノイズパワの比（ＳＮＲ）に基づき、減衰係数を乗
じてノイズを減衰する方法、例えば、特許第２７１４６
５６号及び特願平９-５１８８２０号に開示されたもの
が提案されている。As a method for improving this unnatural distortion, a method of multiplying an attenuation coefficient based on the ratio of voice power to noise power (SNR) to attenuate noise, for example, Japanese Patent No. 27146.
No. 56 and Japanese Patent Application No. 9-518820 have been proposed.

【０００８】これらに開示された方法では、音声の大き
い（ＳＮＲが大きい）帯域とノイズの大きい（ＳＮＲが
小さい）帯域を区別して異なる減衰係数を用いるため、
ミュジカルノイズを抑制し、音質を向上させた。[0008] In the methods disclosed in these, different attenuation coefficients are used by distinguishing a band of large speech (large SNR) and a band of large noise (small SNR) from each other.
Suppressed musical noise and improved sound quality.

【０００９】しかし、特許第２７１４６５６号及び特願
平９-５１８８２０号に開示された方法では、処理する
周波数チャネル数（１６チャネル）は十分でないので、
音声ピッチ調波情報を雑音から分離し抽出することがむ
ずかしい。However, in the methods disclosed in Japanese Patent No. 2714656 and Japanese Patent Application No. 9-518820, the number of frequency channels to be processed (16 channels) is not sufficient.
It is difficult to separate voice pitch harmonic information from noise and extract it.

【００１０】また、音声とノイズ両方の帯域に減衰係数
を用いるため、互いに影響を及ぼし合う結果、減衰係数
は大きくすることができない。例えば、減衰係数を大き
くすると、ＳＮＲ推定の誤りによって、音声の歪みを生
じる可能性がある。この結果、ノイズの減衰が不十分と
なる。Further, since the attenuation coefficient is used for both the voice and noise bands, they influence each other, so that the attenuation coefficient cannot be increased. For example, if the attenuation coefficient is increased, the SNR estimation error may cause distortion of the voice. As a result, noise attenuation is insufficient.

【００１１】一方、コムフィルタ法は、音声情報に着目
し、音声ピッチにコムフィルタをかけることにより雑音
減衰を行う方法である。コムフィルタとは、周波数領域
単位で入力された信号を所定の比率で減衰させ、または
減衰させずに信号を出力するフィルタであり、櫛状の減
衰特性をもつ。デジタルデータ処理でコムフィルタ法を
実現する場合、コムフィルタ法では、コムフィルタの減
衰特性を周波数領域毎に減衰特性のデータを作成し、周
波数毎に音声スペクトルを乗算することにより雑音を抑
圧できる。On the other hand, the comb filter method is a method of paying attention to voice information and applying noise to the voice pitch by applying a comb filter. The comb filter is a filter that outputs a signal with or without attenuating a signal input in a frequency domain unit at a predetermined ratio, and has a comb-like attenuation characteristic. When the comb filter method is realized by digital data processing, in the comb filter method, noise can be suppressed by creating attenuation characteristic data of the comb filter for each frequency domain and multiplying the voice spectrum for each frequency.

【００１２】コムフィルタ法に関する文献として、文献
３(J.S.Lim etc.,Evaluation of anadaptive comb filt
ering method for enhancing speech degraded by whit
e noise addition,IEEE Trans.Acoustics,Speech,and S
ignal Processing,vol.ASSP26,pp.354-358,1978)に示す
ものがある。As a document relating to the comb filter method, a document 3 (JSLim etc., Evaluation of anadaptive comb filt
ering method for enhancing speech degraded by whit
e noise addition, IEEE Trans. Acoustics, Speech, and S
ignal Processing, vol.ASSP26, pp.354-358, 1978).

【００１３】従来のコムフィルタ法では、基本周波数で
あるピッチに推定誤差があると、その高調波では誤差分
が拡大し、本来の高調波成分がその通過帯域からはずれ
る可能性がより大きくなる。また、準周期性を持つ音声
とそうでない音声を判別する必要があるため、実現性に
問題がある。また、移動体通信において、単に雑音を抑
圧することによって、自然感のある周囲騒音も抑圧さ
れ、通話の違和感を生じることがある。In the conventional comb filter method, if there is an estimation error in the pitch, which is the fundamental frequency, the error increases in the higher harmonics, and the original higher harmonic component is more likely to deviate from its pass band. Further, there is a problem in feasibility because it is necessary to distinguish between voices having quasi-periodicity and voices that do not. Further, in mobile communication, by simply suppressing noise, ambient noise having a natural feeling is also suppressed, which may cause discomfort in a call.

【００１４】通話の違和感を解消する方法として、音声
と雑音を分離し、それぞれ符号化と復号化を行うことに
よって良好な音声と周囲騒音を再生する技術がある。例
えば、文献４（三関公生, 押切正浩, 音声・背景雑音
分離にもとづく低レート音声符号化, 日本音響学会講演
論文集, pp.235-236, 平成10年3月）に示した方法があ
る。As a method of eliminating the discomfort of a telephone call, there is a technique of separating voice and noise, and reproducing and reproducing good voice and ambient noise by performing encoding and decoding, respectively. For example, the method shown in Reference 4 (Kimio Mizeki, Masahiro Oshikiri, Low-rate speech coding based on speech / background noise separation, Proceedings of the Acoustical Society of Japan, pp.235-236, March 1998) is used. is there.

【００１５】文献４に示した方法は、ＳＳ法を用いて音
声強調を行い、雑音抑圧された入力信号を音声成分と
し、入力信号と音声成分を減算した結果を雑音成分とす
る方法である。基本的にはＳＳ法と同じ方法であるた
め、本方法による分離方法では良好な音声と雑音特性を
得ることが難しい。The method described in Document 4 is a method in which voice enhancement is performed using the SS method, a noise-suppressed input signal is used as a voice component, and a result obtained by subtracting the input signal and the voice component is used as a noise component. Since it is basically the same method as the SS method, it is difficult to obtain good voice and noise characteristics by the separation method according to this method.

【００１６】[0016]

【発明が解決しようとする課題】このように、従来の装
置においては、音声の歪みが少なくかつ雑音を十分に除
去することが難しいという問題がある。As described above, in the conventional device, there is a problem that the distortion of the voice is small and it is difficult to sufficiently remove the noise.

【００１７】発明は、かかる点に鑑みてなされたもので
あり、音声の歪みが少なくかつ雑音を十分に除去するこ
とができる音声強調装置を提供するとともに、良好な音
声と雑音の特性を得られる音声強調装置及び音声強調方
法を提供することを目的とする。The present invention has been made in view of the above points, and provides a voice emphasizing device which has a small amount of distortion of voice and is capable of sufficiently removing noise, and at the same time, excellent characteristics of voice and noise can be obtained. An object is to provide a voice enhancement device and a voice enhancement method.

【００１８】[0018]

【課題を解決するための手段】本発明の音声強調装置
は、入力信号のスペクトルを所定の周波数単位で分割し
た周波数分割スペクトルを出力する周波数分割手段と、
周波数分割スペクトルに基づいて無音と判定した周波数
領域の信号を減衰する第一コムフィルタを作成する第一
コムフィルタ作成手段と、周波数分割スペクトルに基づ
いて第一コムフィルタより多くの雑音のピークを取り除
いたフィルタである第二コムフィルタを作成する第二コ
ムフィルタ作成手段と、前記第二コムフィルタと前記周
波数分割スペクトルとから推定した音声ピッチで前記第
一コムフィルタに含まれる音声ピッチ調波構造を修正す
るコムフィルタ修正手段と、前記コムフィルタ修正手段
において修正された第一コムフィルタを用いて前記周波
数分割スペクトルのノイズを抑圧する抑圧手段と、ノイ
ズを抑圧した前記周波数分割スペクトルを周波数領域で
連続したスペクトル信号に合成する音声周波数合成手段
と、を具備する構成を採る。A speech emphasizing apparatus according to the present invention comprises frequency division means for outputting a frequency division spectrum obtained by dividing the spectrum of an input signal in predetermined frequency units,
A first comb filter creating means for creating a first comb filter for attenuating a frequency domain signal determined to be silent based on the frequency divided spectrum, and removing more noise peaks than the first comb filter based on the frequency divided spectrum Second comb filter creating means for creating a second comb filter which is a filter, and a voice pitch harmonic structure included in the first comb filter with a voice pitch estimated from the second comb filter and the frequency division spectrum. Comb filter modifying means for modifying, suppressing means for suppressing the noise of the frequency-divided spectrum using the first comb filter modified by the comb-filter modifying means, and the frequency-divided spectrum in which the noise is suppressed is continuous in the frequency domain. And a voice frequency synthesizing means for synthesizing the same into the spectrum signal. The take.

【００１９】この構成によれば、音声信号のスペクトル
から音声ピークである可能性の高いピークを取り出して
音声ピッチを推定するコムフィルタを作成し、このコム
フィルタから正確な音声ピッチ情報を取得し、また、で
きる限り多くの音声情報を取り出してノイズ信号の抑圧
を行うコムフィルタを作成し、このコムフィルタを用い
て雑音に埋もれた音声スペクトルのピークを抑圧しない
ことにより、正しい音声ピッチ情報に基づいて欠落した
音声ピッチ調波構造を補ったコムフィルタを作成するこ
とができ、このコムフィルタを用いてノイズ信号を抑圧
することにより、音声歪みの少ない音声強調を行うこと
ができる。According to this configuration, a comb filter for estimating a voice pitch by extracting a peak which is likely to be a voice peak from the spectrum of the voice signal is created, and accurate voice pitch information is acquired from the comb filter. In addition, a comb filter that extracts as much speech information as possible to suppress the noise signal is created, and the peak of the speech spectrum buried in noise is not suppressed by using this comb filter. It is possible to create a comb filter that compensates for the missing voice pitch harmonic structure, and by suppressing noise signals using this comb filter, it is possible to perform voice enhancement with less voice distortion.

【００２０】本発明の音声強調装置は、第一コムフィル
タと第二コムフィルタとから音声スペクトルに音声成分
が含まれているか否か判定する音声／雑音フレーム検出
手段を具備し、コムフィルタ修正手段は、前記音声／雑
音フレーム検出手段の判定が音声成分を含まない結果で
ある場合、前記第一コムフィルタに各周波数成分で信号
を減衰する修正をする構成を採る。The speech enhancement apparatus of the present invention comprises speech / noise frame detection means for determining whether or not the speech spectrum includes a speech component from the first comb filter and the second comb filter, and the comb filter correction means. If the result of the determination by the voice / noise frame detection means is that the voice component is not included, the first comb filter is modified to attenuate the signal at each frequency component.

【００２１】本発明の音声強調装置は、前記音声／雑音
フレーム検出手段は、第一コムフィルタの通過域におけ
る入力信号のパワスペクトルの和と第一コムフィルタの
阻止域における入力信号のパワスペクトルの和の比を第
一結果とし、第二コムフィルタの通過域における入力信
号のパワスペクトルの和と第二コムフィルタの阻止域に
おける入力信号のパワスペクトルの和の比を第二結果と
し、第一結果と第二結果を加算した結果が所定の閾値よ
り大きい場合、前記加算結果を用い、前記加算結果が所
定の閾値以下である場合、第二結果を用いて音声スペク
トルに音声が含まれるか否か判定する構成を採る。In the speech emphasizing apparatus of the present invention, the speech / noise frame detecting means includes a sum of power spectra of the input signal in the pass band of the first comb filter and a power spectrum of the input signal in the stop band of the first comb filter. The first result is the ratio of the sums, and the second result is the ratio of the sum of the power spectra of the input signals in the pass band of the second comb filter to the sum of the power spectra of the input signals in the stop band of the second comb filter. If the result of adding the result and the second result is larger than a predetermined threshold value, the addition result is used, and if the addition result is less than or equal to the predetermined threshold value, whether the voice spectrum is included in the voice spectrum using the second result. Take the configuration to judge whether or not.

【００２２】これらの構成によれば、第一コムフィルタ
と第二コムフィルタとから音声スペクトルに音声成分が
含まれているか否か判定し、この判定が音声成分を含ま
ない結果である場合、前記第一コムフィルタに各周波数
成分で信号を減衰することにより、突発的に発生するノ
イズを抑圧することができる。According to these configurations, it is determined from the first comb filter and the second comb filter whether or not the voice spectrum includes a voice component, and when the determination result is that the voice component does not include the voice component, By attenuating the signal with each frequency component in the first comb filter, it is possible to suppress the noise that is suddenly generated.

【００２３】本発明の音声強調装置は、第一コムフィル
タ生成手段は、入力信号のパワスペクトルの極小値から
所定の範囲を阻止域とする第一コムフィルタを作成する
構成を採る。In the voice emphasizing device of the present invention, the first comb filter generating means adopts a construction for forming a first comb filter having a predetermined range as a stop band from the minimum value of the power spectrum of the input signal.

【００２４】この構成によれば、入力信号のパワスペク
トルの極小値から所定の範囲を阻止域とするコムフィル
タを作成し、このコムフィルタを用いてノイズ信号を抑
圧することにより、音声とノイズのレベル差が少ない場
合でも、音声ピッチ調波構造を抽出して修復して音声歪
を低減することができる。According to this configuration, a comb filter having a predetermined range from the minimum value of the power spectrum of the input signal as a stop band is created, and the noise signal is suppressed by using this comb filter, whereby the voice signal and the noise signal are suppressed. Even if the level difference is small, the voice pitch harmonic structure can be extracted and repaired to reduce the voice distortion.

【００２５】本発明の音声強調装置は、入力信号のパワ
スペクトルからノイズベースを減算し、減算結果の自己
相関関数を用いて音声ピッチを推定するピッチ推定手段
を具備し、コムフィルタ修正手段は、前記ピッチ推定手
段において推定した音声ピッチで前記第一コムフィルタ
に含まれる音声ピッチ調波構造を修正する構成を採る。The speech emphasizing device of the present invention comprises pitch estimating means for subtracting the noise base from the power spectrum of the input signal and estimating the speech pitch using the autocorrelation function of the subtraction result. The voice pitch harmonic structure included in the first comb filter is modified by the voice pitch estimated by the pitch estimating means.

【００２６】この構成によれば、入力信号のパワスペク
トルからノイズベースを減算し、減算結果の自己相関関
数に基づいて音声ピッチを推定し、推定した音声ピッチ
でコムフィルタに含まれる音声ピッチ調波構造を修正す
ることにより、コムフィルタのピッチ調波構造の修復を
行うことができ、音声歪みの少ない音声強調を行うこと
ができる。According to this configuration, the noise base is subtracted from the power spectrum of the input signal, the voice pitch is estimated based on the autocorrelation function of the subtraction result, and the voice pitch harmonic included in the comb filter is estimated with the estimated voice pitch. By modifying the structure, the pitch harmonic structure of the comb filter can be restored, and the voice enhancement with less voice distortion can be performed.

【００２７】本発明の音声強調装置は、入力信号のパワ
スペクトルからノイズベースを減算した結果に、直流成
分に所定のパワを持つ擬似ピークを生成する直流成分生
成手段を具備し、ピッチ推定手段は、前記直流成分生成
手段において擬似ピークを生成した前記パワスペクトル
から音声ピッチを推定する構成を採る。The speech emphasizing device of the present invention comprises a direct current component generating means for generating a pseudo peak having a predetermined power in the direct current component as a result of subtracting the noise base from the power spectrum of the input signal, and the pitch estimating means is The voice pitch is estimated from the power spectrum in which the pseudo peak is generated by the DC component generating means.

【００２８】この構成によれば、入力信号のパワスペク
トルからノイズベースを減算した結果に直流成分に所定
のパワを持つ擬似ピークを生成し、直流成分を生成した
スペクトルの自己相関関数に基づいて音声ピッチを推定
し、推定した音声ピッチでコムフィルタに含まれる音声
ピッチ調波構造を修正することにより、音声スペクトル
の調波ピークが少ない場合でも、ピッチ情報を得てコム
フィルタのピッチ調波構造の修復を行うことができ、音
声歪みの少ない音声強調を行うことができる。According to this structure, a pseudo peak having a predetermined power in the DC component is generated as a result of subtracting the noise base from the power spectrum of the input signal, and the voice is generated based on the autocorrelation function of the spectrum in which the DC component is generated. By estimating the pitch and modifying the voice pitch harmonic structure included in the comb filter with the estimated voice pitch, pitch information can be obtained by obtaining pitch information even if there are few harmonic peaks in the voice spectrum. Restoration can be performed, and voice enhancement with less voice distortion can be performed.

【００２９】本発明の音声強調装置は、入力信号のパワ
スペクトルからノイズベースを減算した結果において、
パワが所定の閾値以上である周波数領域の数の移動平均
を算出するノイズ特性推定手段を具備し、第二コムフィ
ルタ作成手段は、前記移動平均から入力信号に音声が含
まれるか否か判断した結果より第二コムフィルタを作成
する構成を採る。The speech enhancement apparatus of the present invention has the following result obtained by subtracting the noise base from the power spectrum of the input signal:
The noise characteristic estimating means for calculating the moving average of the number of frequency regions whose power is equal to or higher than the predetermined threshold is provided, and the second comb filter creating means determines whether or not the input signal includes voice from the moving average. The configuration that creates the second comb filter from the result is adopted.

【００３０】この構成によれば、入力信号のノイズレベ
ルの分布を検出し、この分布に基づいて音声スペクトル
からコムフィルタを生成する基準を決定し、作成したコ
ムフィルタからピッチ情報を取得することにより、雑音
の状態に応じたピッチ情報を取得してコムフィルタを作
成することができ、音声歪みの少ない音声強調を行うこ
とができる。According to this configuration, the distribution of the noise level of the input signal is detected, the reference for generating the comb filter is determined from the voice spectrum based on this distribution, and the pitch information is acquired from the created comb filter. , It is possible to obtain pitch information according to the state of noise and create a comb filter, and it is possible to perform speech enhancement with less speech distortion.

【００３１】本発明の音声強調装置は、入力信号のパワ
スペクトルからノイズベースを減算した結果において、
パワが所定の閾値以上である周波数領域の数の移動平均
を算出するノイズ特性推定手段を具備し、第二コムフィ
ルタ作成手段は、ノイズ特性推定手段において算出され
た移動平均が所定の値以下である場合、所定の周波数領
域を阻止域とする第二コムフィルタを作成する構成を採
る。The speech enhancement apparatus of the present invention has the following result obtained by subtracting the noise base from the power spectrum of the input signal.
The second comb filter creating means has a noise characteristic estimating means for calculating a moving average of the number of frequency regions whose power is equal to or higher than a predetermined threshold, and the second comb filter creating means has a moving average calculated by the noise characteristic estimating means at a predetermined value or less. In some cases, a second comb filter having a predetermined frequency range as a stop band is used.

【００３２】この構成によれば、ノイズ特性の推定結果
に基づいて周波数成分を選択し、第二コムフィルタにお
いて、選択された周波数領域をすべて阻止域に変換する
ことによって、分散値の大きいノイズにより生成した偽
のピッチ調波を減少し、偽のピッチ調波を生じにくい低
周波数領域におけるピッチ調波を基準に、ピッチ調波構
造の修復を行えば、正確にピッチ調波構造を修復するこ
とができる。According to this structure, the frequency component is selected based on the estimation result of the noise characteristic, and the selected frequency region is converted into the stop band in the second comb filter. Accurate restoration of the pitch harmonic structure by reducing the generated false pitch harmonics and repairing the pitch harmonic structure based on the pitch harmonics in the low frequency range where false pitch harmonics are less likely to occur. You can

【００３３】本発明の音声強調装置は、入力信号のパワ
スペクトルとノイズベースから信号対雑音比を算出する
ＳＮＲ推定手段を具備し、抑圧手段は、前記信号対雑音
比から周波数分割スペクトルのノイズの抑圧量を決定す
る構成を採る。The speech emphasizing apparatus of the present invention comprises SNR estimating means for calculating a signal-to-noise ratio from the power spectrum of the input signal and the noise base, and the suppressing means suppresses the noise of the frequency division spectrum from the signal-to-noise ratio. Adopt a configuration that determines the amount of suppression.

【００３４】この構成によれば、修正コムフィルタの通
過域と阻止域において、入力音声パワスペクトルからノ
イズベースを減算する量とノイズ減衰の度合いをＳＮＲ
推定値の大きさに応じて調整することによって、異なる
ＳＮＲの環境下でも適切なノイズ減衰を行い、音声歪と
残留ノイズの少ない音声強調を実現できる。According to this configuration, in the pass band and stop band of the modified comb filter, the amount by which the noise base is subtracted from the input speech power spectrum and the degree of noise attenuation are calculated as SNR.
By adjusting according to the magnitude of the estimated value, it is possible to perform appropriate noise attenuation even in environments with different SNRs, and realize voice enhancement with less voice distortion and residual noise.

【００３５】本発明の音声強調装置は、ＳＮＲ推定手段
は、音声成分のレベルを入力信号のパワスペクトルの移
動平均値より算出し、雑音成分のレベルをノイズベース
の推定値に各周波数成分別に重み計数を乗算した値より
算出し、前記音声成分のレベルと前記雑音成分のレベル
の比から信号対雑音比を算出する構成を採る。In the speech emphasizing apparatus of the present invention, the SNR estimating means calculates the level of the speech component from the moving average value of the power spectrum of the input signal, and weights the noise component level to the noise-based estimation value for each frequency component. It is calculated from the value multiplied by the count, and the signal-to-noise ratio is calculated from the ratio of the level of the voice component and the level of the noise component.

【００３６】この構成によれば、入力音声パワスペクト
ルの移動平均値からノイズベースを減算して音声レベル
の計算することにより、ノイズの影響を減少し、低ＳＮ
Ｒの環境下でも正確な音声レベルを計算することができ
る。また、ノイズベース推定値の各周波数成分に重み係
数を乗算して計算することにより、異なるノイズに対し
て適切な減衰を行い、音声歪を低減することができる。According to this structure, the noise base is subtracted from the moving average value of the input voice power spectrum to calculate the voice level, thereby reducing the influence of noise and reducing the SN.
An accurate voice level can be calculated even under the R environment. In addition, by multiplying each frequency component of the noise-based estimated value by a weighting coefficient for calculation, it is possible to appropriately attenuate different noises and reduce voice distortion.

【００３７】本発明の音声強調装置は、信号対雑音比と
前記信号対雑音比の移動平均値との偏差を算出し、前記
偏差を用いて前記信号対雑音比の移動平均値を更新する
変動抑圧手段を具備し、抑圧手段は、変動抑圧手段にお
いて更新された前記信号対雑音比の移動平均値から周波
数分割スペクトルのノイズの抑圧量を決定する構成を採
る。The speech enhancement apparatus of the present invention calculates the deviation between the signal-to-noise ratio and the moving average value of the signal-to-noise ratio, and uses the deviation to update the moving-average value of the signal-to-noise ratio. The suppressing means is provided with a suppressing means, and the suppressing means determines a noise suppressing amount of the frequency division spectrum from the moving average value of the signal-to-noise ratio updated by the fluctuation suppressing means.

【００３８】この構成によれば、ＳＮＲ推定値とＳＮＲ
推定値の長期移動平均値の偏差を計算し、ＳＮＲ推定値
の長期移動平均値と前記偏差の一部を加算してＳＮＲ推
定値として用いることによって、ＳＮＲの変動を有効に
抑制し、安定的にＳＮＲの大きさに応じてノイズ減衰の
レベル調整を行うことができる。According to this configuration, the SNR estimated value and the SNR
The deviation of the long-term moving average of the estimated value is calculated, and the long-term moving average of the SNR estimated value and a part of the deviation are added and used as the SNR estimated value, thereby effectively suppressing the fluctuation of the SNR and stabilizing Further, the level of noise attenuation can be adjusted according to the magnitude of SNR.

【００３９】本発明の音声強調装置は、所定の時間単位
で更新速度の異なる二つのノイズベースの移動平均値を
算出し、第一移動平均値より更新速度の速い第二移動平
均値で第一移動平均値の更新条件を変更し、第一移動平
均値をノイズベース推定値として出力するノイズベース
更新手段を具備する構成を採る。The speech emphasizing device of the present invention calculates two noise-based moving average values having different update speeds in a predetermined time unit, and uses the second moving average value having a faster update speed than the first moving average value as the first moving average value. A configuration is provided that includes a noise base updating unit that changes a moving average value update condition and outputs the first moving average value as a noise base estimated value.

【００４０】この構成によれば、更新速度の速い移動平
均係数を用いてノイズベースの推定を行うことによっ
て、音声区間においても雑音レベルの急激な変動を追跡
することができる。また、更新速度の遅いノイズベース
の更新は、更新速度の速いノイズベースに基づいて行う
ことによって、正確にノイズベースの推定を行うことが
でき、雑音レベルの急激な変動によるノイズベース更新
の停止を防止することができる。According to this configuration, noise-based estimation can be performed using the moving average coefficient having a high update rate, so that a rapid change in the noise level can be tracked even in the voice section. In addition, the noise-based update with a slow update speed can be accurately estimated by performing the noise-based update with a fast update speed, and the noise-based update can be stopped due to a sudden change in the noise level. Can be prevented.

【００４１】本発明の無線通信装置は、上記いずれかに
記載の音声強調装置を具備する構成を採る。The wireless communication apparatus of the present invention has a configuration including any one of the above-described voice enhancing apparatuses.

【００４２】この構成によれば、音声信号のスペクトル
から音声ピークである可能性の高いピークを取り出して
音声ピッチを推定するコムフィルタを作成し、このコム
フィルタから正確な音声ピッチ情報を取得し、また、で
きる限り多くの音声情報を取り出して力信号の抑圧を行
うコムフィルタを作成し、このコムフィルタを用いて雑
音に埋もれた音声スペクトルのピークを抑圧しないこと
により、正しい音声ピッチ情報に基づいて欠落した音声
ピッチ調波構造を補ったコムフィルタを作成することが
でき、このコムフィルタを用いてノイズ信号を抑圧する
ことにより、音声歪みの少ない音声強調を行うことがで
きる。According to this structure, a comb filter for estimating a voice pitch by extracting a peak which is likely to be a voice peak from the spectrum of the voice signal is created, and accurate voice pitch information is obtained from this comb filter. In addition, a comb filter that extracts as much speech information as possible and suppresses the force signal is created, and the peak of the speech spectrum buried in noise is not suppressed by using this comb filter. It is possible to create a comb filter that compensates for the missing voice pitch harmonic structure, and by suppressing noise signals using this comb filter, it is possible to perform voice enhancement with less voice distortion.

【００４３】本発明の雑音抑圧装置は、入力信号のスペ
クトルを所定の周波数単位で分割した周波数分割スペク
トルを出力する周波数分割手段と、周波数分割スペクト
ルに基づいて無音と判定した周波数領域の信号を通過域
とする雑音分離コムフィルタを作成する雑音分離コムフ
ィルタ作成手段と、前記雑音分離コムフィルタを用いて
前記周波数分割スペクトルの雑音成分を分離する抑圧手
段と、雑音成分を分離した前記周波数分割スペクトルを
周波数領域で連続したスペクトル信号に合成する音声周
波数合成手段と、を具備する構成を採る。The noise suppression apparatus of the present invention passes through a frequency division means for outputting a frequency division spectrum obtained by dividing the spectrum of the input signal in a predetermined frequency unit, and a signal in the frequency domain determined to be silent based on the frequency division spectrum. Noise separating comb filter creating means for creating a noise separating comb filter as a frequency band, a suppressing means for separating a noise component of the frequency divided spectrum using the noise separating comb filter, and the frequency divided spectrum obtained by separating the noise component. And a voice frequency synthesizing means for synthesizing a continuous spectrum signal in the frequency domain.

【００４４】この構成によれば、雑音専用コムフィルタ
を生成することにより、雑音の特性を最大限に抽出する
ことができる。According to this structure, the noise characteristic can be extracted to the maximum extent by generating the noise dedicated comb filter.

【００４５】本発明の雑音抑圧装置は、雑音分離手段
は、雑音分離用コムフィルタの通過域において、入力音
声スペクトルの実数部と虚数部に別々の乱数とノイズベ
ースの推定値とを乗算する構成を採る。In the noise suppressing device of the present invention, the noise separating means is configured to multiply the real part and the imaginary part of the input speech spectrum by different random numbers and noise-based estimation values in the pass band of the noise separating comb filter. Take.

【００４６】この構成によれば、雑音分離コムフィルタ
の阻止域において、雑音成分を減衰せず、雑音分離コム
フィルタの通過域において、入力音声スペクトルの実数
部と虚数部に対して、別々の乱数とノイズベースの推定
値を乗算することによって、雑音成分の実数部と虚数部
の振幅と位相はすべてランダム化され、良好な雑音分離
特性を得ることができる。According to this configuration, noise components are not attenuated in the stop band of the noise separation comb filter, and separate random numbers are provided for the real part and the imaginary part of the input speech spectrum in the pass band of the noise separation comb filter. And the noise-based estimation value are multiplied, the amplitudes and phases of the real and imaginary parts of the noise component are all randomized, and good noise separation characteristics can be obtained.

【００４７】本発明の雑音抑圧装置は、音声分離用コム
フィルタの阻止域における入力音声のスペクトル成分を
記憶する雑音成分保存手段を具備し、雑音分離手段は、
メモリに保存したスペクトル成分を雑音分離用コムフィ
ルタの通過域に用いる構成を採る。The noise suppressing device of the present invention comprises a noise component storing means for storing the spectral component of the input voice in the stop band of the voice separating comb filter, and the noise separating means is
A configuration is used in which the spectral components stored in the memory are used in the pass band of the noise separation comb filter.

【００４８】この構成によれば、雑音分離用コムフィル
タの阻止域における入力音声のスペクトル成分をメモリ
に保存し、その値を雑音分離用コムフィルタの通過域に
用いることにより、実際の雑音と特性の近い擬似雑音を
再構成することができ、良好な雑音分離特性を得ること
ができる。According to this configuration, the spectral component of the input voice in the stop band of the noise separating comb filter is stored in the memory, and the value is used in the pass band of the noise separating comb filter to obtain the actual noise and characteristics. Can be reconstructed, and good noise separation characteristics can be obtained.

【００４９】本発明の無線通信装置は、上記いずれかに
記載の雑音抑圧装置を具備する構成を採る。A radio communication apparatus of the present invention has a configuration including any one of the noise suppressing apparatuses described above.

【００５０】この構成によれば、雑音専用コムフィルタ
を生成することにより、雑音の特性を最大限に抽出する
ことができる。According to this configuration, the noise characteristic can be maximized by generating the noise-only comb filter.

【００５１】本発明の音源分離装置は、上記いずれかに
記載の音声強調装置と、上記いずれかに記載の雑音抑圧
装置と、を具備する構成を採る。The sound source separation device of the present invention has a configuration including any one of the above speech enhancement device and any one of the above noise suppression devices.

【００５２】この構成によれば、音声信号のスペクトル
から音声ピークである可能性の高いピークを取り出して
音声ピッチを推定するコムフィルタを作成し、このコム
フィルタから正確な音声ピッチ情報を取得し、また、で
きる限り多くの音声情報を取り出して力信号の抑圧を行
うコムフィルタを作成し、このコムフィルタを用いて雑
音に埋もれた音声スペクトルのピークを抑圧しないこと
により、正しい音声ピッチ情報に基づいて欠落した音声
ピッチ調波構造を補ったコムフィルタを作成することが
でき、このコムフィルタを用いてノイズ信号を抑圧する
ことにより、音声歪みの少ない音声強調を行うことがで
きる。また、この構成によれば、雑音専用コムフィルタ
を生成することにより、雑音の特性を最大限に抽出する
ことができる。According to this structure, a comb filter for estimating a voice pitch by extracting a peak that is likely to be a voice peak from the spectrum of the voice signal is created, and accurate voice pitch information is acquired from this comb filter. In addition, a comb filter that extracts as much speech information as possible and suppresses the force signal is created, and the peak of the speech spectrum buried in noise is not suppressed by using this comb filter. It is possible to create a comb filter that compensates for the missing voice pitch harmonic structure, and by suppressing noise signals using this comb filter, it is possible to perform voice enhancement with less voice distortion. Further, according to this configuration, the noise characteristics can be maximized by generating the noise-only comb filter.

【００５３】本発明の音声強調方法は、入力信号のスペ
クトルを所定の周波数単位で分割した周波数分割スペク
トルを出力する周波数分割行程と、周波数分割スペクト
ルに基づいて無音と判定した周波数領域の信号を減衰す
る第一コムフィルタを作成する第一コムフィルタ作成行
程と、周波数分割スペクトルに基づいて第一コムフィル
タより多くの雑音のピークを取り除いたフィルタである
第二コムフィルタを作成する第二コムフィルタ作成行程
と、前記第二コムフィルタと前記周波数分割スペクトル
とから推定した音声ピッチで前記第一コムフィルタに含
まれる音声ピッチ調波構造を修正するコムフィルタ修正
行程と、前記コムフィルタ修正行程において修正された
第一コムフィルタを用いて前記周波数分割スペクトルの
ノイズを抑圧する抑圧行程と、ノイズを抑圧した前記周
波数分割スペクトルを周波数領域で連続したスペクトル
信号に合成する音声周波数合成行程と、を具備するよう
にした。The speech emphasizing method of the present invention includes a frequency division step of outputting a frequency division spectrum obtained by dividing the spectrum of an input signal in predetermined frequency units, and a signal in a frequency domain determined to be silent based on the frequency division spectrum. Create a first comb filter Create a first comb filter and create a second comb filter that is a filter that removes more noise peaks than the first comb filter based on the frequency division spectrum A step, a comb filter correction step that corrects a voice pitch harmonic structure included in the first comb filter with a voice pitch estimated from the second comb filter and the frequency division spectrum, and is corrected in the comb filter correction step. Suppress the noise of the frequency division spectrum using the first comb filter And as stroke, and to be provided with audio frequency synthesizing step of synthesizing the frequency division spectrum suppressed noise spectrum signals continuous in the frequency region.

【００５４】この方法によれば、音声信号のスペクトル
から音声ピークである可能性の高いピークを取り出して
音声ピッチを推定するコムフィルタを作成し、このコム
フィルタから正確な音声ピッチ情報を取得し、また、で
きる限り多くの音声情報を取り出して力信号の抑圧を行
うコムフィルタを作成し、このコムフィルタを用いて雑
音に埋もれた音声スペクトルのピークを抑圧しないこと
により、正しい音声ピッチ情報に基づいて欠落した音声
ピッチ調波構造を補ったコムフィルタを作成することが
でき、このコムフィルタを用いて音声信号を抑圧するこ
とにより、音声歪みの少ない音声強調を行うことができ
る。According to this method, a comb filter for estimating a voice pitch by extracting a peak which is likely to be a voice peak from the spectrum of a voice signal is created, and accurate voice pitch information is obtained from this comb filter. In addition, a comb filter that extracts as much speech information as possible and suppresses the force signal is created, and the peak of the speech spectrum buried in noise is not suppressed by using this comb filter. It is possible to create a comb filter that compensates for the missing voice pitch harmonic structure, and by suppressing a voice signal using this comb filter, voice enhancement with less voice distortion can be performed.

【００５５】本発明の雑音抑圧方法は、入力信号のスペ
クトルを所定の周波数単位で分割した周波数分割スペク
トルを出力する周波数分割行程と、周波数分割スペクト
ルに基づいて無音と判定した周波数領域の信号を通過域
とする雑音分離コムフィルタを作成する雑音分離コムフ
ィルタ作成行程と、前記雑音分離コムフィルタを用いて
前記周波数分割スペクトルの雑音成分を分離する抑圧行
程と、雑音成分を分離した前記周波数分割スペクトルを
周波数領域で連続したスペクトル信号に合成する音声周
波数合成行程と、を具備するようにした。The noise suppression method of the present invention passes the frequency division process of outputting the frequency division spectrum obtained by dividing the spectrum of the input signal in a predetermined frequency unit, and the signal in the frequency domain determined to be silent based on the frequency division spectrum. A noise separation comb filter creation step for creating a noise separation comb filter as a band, a suppression step for separating a noise component of the frequency division spectrum by using the noise separation comb filter, and the frequency division spectrum obtained by separating the noise component. And a voice frequency synthesizing step for synthesizing a continuous spectrum signal in the frequency domain.

【００５６】この方法によれば、雑音専用コムフィルタ
を生成することにより、雑音の特性を最大限に抽出する
ことができる。According to this method, the noise characteristics can be extracted to the maximum extent by generating the noise dedicated comb filter.

【００５７】本発明の音声強調プログラムは、入力信号
のスペクトルを所定の周波数単位で分割した周波数分割
スペクトルを出力する周波数分割ステップと、周波数分
割スペクトルに基づいて無音と判定した周波数領域の信
号を減衰する第一コムフィルタを作成する第一コムフィ
ルタ作成ステップと、周波数分割スペクトルに基づいて
第一コムフィルタより多くの雑音のピークを取り除いた
フィルタである第二コムフィルタを作成する第二コムフ
ィルタ作成ステップと、前記第二コムフィルタと前記周
波数分割スペクトルとから推定した音声ピッチで前記第
一コムフィルタに含まれる音声ピッチ調波構造を修正す
るコムフィルタ修正ステップと、前記コムフィルタ修正
手段において修正された第一コムフィルタを用いて前記
周波数分割スペクトルのノイズを抑圧する抑圧ステップ
と、ノイズを抑圧した前記周波数分割スペクトルを周波
数領域で連続したスペクトル信号に合成する音声周波数
合成ステップと、をコンピュータに実行させる構成を採
る。A speech emphasizing program of the present invention outputs a frequency-divided spectrum obtained by dividing the spectrum of an input signal into predetermined frequency units, and attenuates a signal in a frequency domain determined to be silent based on the frequency-divided spectrum. The first comb filter creation step to create the first comb filter and the second comb filter creation to create the second comb filter, which is a filter that removes more noise peaks than the first comb filter based on the frequency division spectrum Step, a comb filter correction step of correcting a voice pitch harmonic structure included in the first comb filter with a voice pitch estimated from the second comb filter and the frequency division spectrum, and corrected by the comb filter correction means. The frequency division spectrum using the first comb filter Take a suppression step of suppressing Le noise, the structure to execute the audio frequency synthesizing step of synthesizing said frequency division spectrum suppressed noise spectrum signals continuous in the frequency domain, to the computer.

【００５８】この構成によれば、音声信号のスペクトル
から音声ピークである可能性の高いピークを取り出して
音声ピッチを推定するコムフィルタを作成し、このコム
フィルタから正確な音声ピッチ情報を取得し、また、で
きる限り多くの音声情報を取り出してノイズ信号の抑圧
を行うコムフィルタを作成し、このコムフィルタを用い
て雑音に埋もれた音声スペクトルのピークを抑圧しない
ことにより、正しい音声ピッチ情報に基づいて欠落した
音声ピッチ調波構造を補ったコムフィルタを作成するこ
とができ、このコムフィルタを用いてノイズ信号を抑圧
することにより、音声歪みの少ない音声強調を行うこと
ができる。According to this configuration, a comb filter for estimating a voice pitch by extracting a peak which is likely to be a voice peak from the spectrum of the voice signal is created, and accurate voice pitch information is acquired from this comb filter. In addition, a comb filter that extracts as much speech information as possible to suppress the noise signal is created, and the peak of the speech spectrum buried in noise is not suppressed by using this comb filter. It is possible to create a comb filter that compensates for the missing voice pitch harmonic structure, and by suppressing noise signals using this comb filter, it is possible to perform voice enhancement with less voice distortion.

【００５９】本発明の雑音分離プログラムは、入力信号
のスペクトルを所定の周波数単位で分割した周波数分割
スペクトルを出力する周波数分割ステップと、周波数分
割スペクトルに基づいて無音と判定した周波数領域の信
号を通過域とする雑音分離コムフィルタを作成する雑音
分離コムフィルタ作成ステップと、前記雑音分離コムフ
ィルタを用いて前記周波数分割スペクトルの雑音成分を
分離する抑圧ステップと、雑音成分を分離した前記周波
数分割スペクトルを周波数領域で連続したスペクトル信
号に合成する音声周波数合成ステップと、をコンピュー
タに実行させる構成を採る。The noise separation program of the present invention passes a frequency division step of outputting a frequency division spectrum obtained by dividing the spectrum of an input signal into predetermined frequency units, and a signal in the frequency domain determined to be silent based on the frequency division spectrum. A noise separation comb filter creating step for creating a noise separation comb filter, a suppression step for separating a noise component of the frequency division spectrum using the noise separation comb filter, and the frequency division spectrum obtained by separating the noise component. An audio frequency synthesizing step of synthesizing a continuous spectrum signal in the frequency domain, and a configuration for causing a computer to execute.

【００６０】この構成によれば、雑音専用コムフィルタ
を生成することにより、雑音の特性を最大限に抽出する
ことができる。According to this configuration, the noise characteristic can be maximized by generating the noise-only comb filter.

【００６１】本発明のサーバ装置は、入力信号のスペク
トルを所定の周波数単位で分割した周波数分割スペクト
ルを出力する周波数分割ステップと、周波数分割スペク
トルに基づいて無音と判定した周波数領域の信号を減衰
する第一コムフィルタを作成する第一コムフィルタ作成
ステップと、周波数分割スペクトルに基づいて第一コム
フィルタより多くの雑音のピークを取り除いたフィルタ
である第二コムフィルタを作成する第二コムフィルタ作
成ステップと、前記第二コムフィルタと前記周波数分割
スペクトルとから推定した音声ピッチで前記第一コムフ
ィルタに含まれる音声ピッチ調波構造を修正するコムフ
ィルタ修正ステップと、前記コムフィルタ修正ステップ
において修正された第一コムフィルタを用いて前記周波
数分割スペクトルのノイズを抑圧する抑圧ステップと、
ノイズを抑圧した前記周波数分割スペクトルを周波数領
域で連続したスペクトル信号に合成する音声周波数合成
ステップと、をコンピュータに実行させることを特徴と
する音声強調プログラムを記憶し、要求に応じて前記音
声強調プログラムを出力する構成を採る。The server device of the present invention outputs a frequency division spectrum obtained by dividing the spectrum of the input signal in predetermined frequency units, and attenuates the signal in the frequency domain determined to be silent based on the frequency division spectrum. First comb filter creation step to create the first comb filter, and second comb filter creation step to create the second comb filter, which is a filter that removes more noise peaks than the first comb filter based on the frequency division spectrum. A comb filter correction step of correcting the voice pitch harmonic structure included in the first comb filter with the voice pitch estimated from the second comb filter and the frequency division spectrum, and the comb filter correction step. The frequency division spectrum using the first comb filter And suppression step of suppressing noise,
Storing a voice emphasizing program characterized by causing a computer to execute a voice frequency synthesizing step of synthesizing the frequency-divided spectrum in which noise is suppressed into a continuous spectrum signal in the frequency domain, and the voice emphasizing program according to a request. Take the configuration to output.

【００６２】この構成によれば、音声信号のスペクトル
から音声ピークである可能性の高いピークを取り出して
音声ピッチを推定するコムフィルタを作成し、このコム
フィルタから正確な音声ピッチ情報を取得し、また、で
きる限り多くの音声情報を取り出して力信号の抑圧を行
うコムフィルタを作成し、このコムフィルタを用いて雑
音に埋もれた音声スペクトルのピークを抑圧しないこと
により、正しい音声ピッチ情報に基づいて欠落した音声
ピッチ調波構造を補ったコムフィルタを作成することが
でき、このコムフィルタを用いてノイズ信号を抑圧する
ことにより、音声歪みの少ない音声強調を行うことがで
きる。According to this configuration, a comb filter for estimating a voice pitch by extracting a peak that is likely to be a voice peak from the spectrum of the voice signal is created, and accurate voice pitch information is acquired from this comb filter. In addition, a comb filter that extracts as much speech information as possible and suppresses the force signal is created, and the peak of the speech spectrum buried in noise is not suppressed by using this comb filter. It is possible to create a comb filter that compensates for the missing voice pitch harmonic structure, and by suppressing noise signals using this comb filter, it is possible to perform voice enhancement with less voice distortion.

【００６３】本発明のサーバ装置は、入力信号のスペク
トルを所定の周波数単位で分割した周波数分割スペクト
ルを出力する周波数分割ステップと、周波数分割スペク
トルに基づいて無音と判定した周波数領域の信号を通過
域とする雑音分離コムフィルタを作成する雑音分離コム
フィルタ作成ステップと、前記雑音分離コムフィルタを
用いて前記周波数分割スペクトルの雑音成分を分離する
抑圧ステップと、雑音成分を分離した前記周波数分割ス
ペクトルを周波数領域で連続したスペクトル信号に合成
する音声周波数合成ステップと、をコンピュータに実行
させることを特徴とする雑音分離プログラムを記憶し、
要求に応じて前記雑音分離プログラムを出力する構成を
採る。The server device of the present invention outputs a frequency division spectrum obtained by dividing the spectrum of the input signal in predetermined frequency units, and a signal in the frequency domain determined to be silent based on the frequency division spectrum in the pass band. And a noise separation comb filter creation step of creating a noise separation comb filter, a suppression step of separating a noise component of the frequency division spectrum using the noise separation comb filter, and a frequency division of the frequency division spectrum separated noise component. A voice frequency synthesizing step for synthesizing a continuous spectrum signal in a region, and storing a noise separation program characterized by causing a computer to execute,
The noise separation program is output in response to a request.

【００６４】この構成によれば、雑音専用コムフィルタ
を生成することにより、雑音の特性を最大限に抽出する
ことができる。According to this configuration, the noise characteristic can be extracted to the maximum extent by generating the noise-only comb filter.

【００６５】[0065]

【発明の実施の形態】本発明の骨子は、音声信号の周波
数分割スペクトルに基づいて音声抑圧に用いるコムフィ
ルタより多くの雑音のピークを取り除いたコムフィルタ
を生成し、このコムフィルタを用いて音声信号のピッチ
情報を取得し、コムフィルタの音声ピッチを補うことで
ある。BEST MODE FOR CARRYING OUT THE INVENTION The essence of the present invention is to generate a comb filter in which more noise peaks are removed than a comb filter used for voice suppression based on a frequency-divided spectrum of a voice signal, and to use this comb filter for voice This is to acquire the pitch information of the signal and supplement the voice pitch of the comb filter.

【００６６】以下、本発明の実施の形態について図面を
参照して詳細に説明する。（実施の形態１）図１は、本発明の実施の形態１に係る
音声強調装置の構成を示すブロック図である。図１にお
いて、音声強調装置１００は、時間分割部１０１と、窓
掛け部１０２と、ＦＦＴ部１０３と、周波数分割部１０
４と、ノイズベース推定部１０５と、第一音声/非音声
識別部１０６と、第二音声/非音声識別部１０７と、第
一コムフィルタ生成部１０８と、第二コムフィルタ生成
部１０９と、有声/無声判別部１１０と、ピッチ推定部
１１１と、ピッチ調波構造修復部１１２と、コムフィル
タ修正部１１３と、音声分離係数計算部１１４と、乗算
部１１５と、音声周波数合成部１１６と、ＩＦＦＴ部１
１７と、から主に構成される。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. (Embodiment 1) FIG. 1 is a block diagram showing a configuration of a speech emphasizing apparatus according to Embodiment 1 of the present invention. In FIG. 1, the speech enhancement device 100 includes a time division unit 101, a windowing unit 102, an FFT unit 103, and a frequency division unit 10.
4, a noise base estimation unit 105, a first voice / non-voice identification unit 106, a second voice / non-voice identification unit 107, a first comb filter generation unit 108, a second comb filter generation unit 109, Voiced / unvoiced discrimination unit 110, pitch estimation unit 111, pitch harmonic structure restoration unit 112, comb filter correction unit 113, speech separation coefficient calculation unit 114, multiplication unit 115, speech frequency synthesis unit 116, IFFT section 1
It is mainly composed of 17.

【００６７】時間分割部１０１は、入力された音声信号
から所定時間単位で区切られたフレームを構成し、窓掛
け部１０２に出力する窓掛け部１０２は、時間分割部１
０１から出力されたフレームにハニングウインドウ等を
利用した窓掛け処理を行ってＦＦＴ部１０３に出力す
る。ＦＦＴ部１０３は、窓掛け部１０２から出力された
音声信号にＦＦＴ（Fast Fourier Transform）を行
い、音声スペクトル信号を周波数分割部１０４に出力す
る。The time division unit 101 forms a frame divided from the input audio signal in units of a predetermined time, and the window division unit 102 outputs the frame to the window division unit 102.
The frame output from 01 is subjected to windowing processing using a Hanning window or the like and output to the FFT unit 103. FFT section 103 performs FFT (Fast Fourier Transform) on the audio signal output from windowing section 102, and outputs the audio spectrum signal to frequency division section 104.

【００６８】周波数分割部１０４は、ＦＦＴ部１０３か
ら出力された音声スペクトルを各周波数成分に分割し
て、周波数成分毎に分割された音声スペクトルＳ
_f（ｋ）（ただし、ｋは周波数成分を特定する番号）を
ノイズベース推定部１０５、第一音声/非音声識別部１
０６、第二音声/非音声識別部１０７、及び乗算部１１
５に出力する。なお、周波数成分は、所定の周波数単位
で分割された音声スペクトルの最小単位を示すものであ
る。Ｓ_f（ｋ）は式（１）で示される。The frequency division unit 104 divides the voice spectrum output from the FFT unit 103 into frequency components, and the voice spectrum S divided for each frequency component.
_f (k) (where k is a number identifying a frequency component) is used as the noise base estimation unit 105 and the first voice / non-voice identification unit 1
06, the second voice / non-voice identification unit 107, and the multiplication unit 11
Output to 5. The frequency component indicates the minimum unit of the voice spectrum divided by a predetermined frequency unit. S _f (k) is represented by equation (1).

【００６９】[0069]

【数１】ここで、Ｒｅ｛Ｄ_f（ｋ）｝²は、ＦＦＴ変換後の入力音
声信号のスペクトルの実数部を示し、Ｉｍ｛Ｄ
_f（ｋ）｝²は、ＦＦＴ変換後の入力音声信号のスペクト
ルの虚数部を示す。[Equation 1] Here, Re {D _f (k)} ² represents the real part of the spectrum of the input speech signal after FFT conversion, and Im {D f
_f (k)} ² indicates the imaginary part of the spectrum of the input voice signal after FFT conversion.

【００７０】ノイズベース推定部１０５は、フレームに
音声成分が含まれていない判定結果が出力された場合、
周波数分割部１０４から出力された音声スペクトルの周
波数成分毎の短時間パワスペクトルを用いてノイズベー
スを更新する。The noise-base estimation unit 105 outputs, when the determination result that the frame does not include a voice component is output,
The noise base is updated using the short-time power spectrum for each frequency component of the audio spectrum output from the frequency division unit 104.

【００７１】具体的には、式（２）を用いて各周波数成
分におけるノイズベースを推定し、推定されたノイズベ
ースを音声/非音声識別部１０６と音声/非音声識別部１
０７に出力する。Specifically, the noise base in each frequency component is estimated using equation (2), and the estimated noise base is used as the speech / non-speech discrimination unit 106 and the speech / non-speech discrimination unit 1.
It outputs to 07.

【００７２】[0072]

【数２】ここで、Ｐ_base（ｎ−１、ｋ）はノイズベース、ｎは処
理を行うフレームを特定する番号、kは周波数成分を特
定する番号を示す。また、Θ_baseは音声とノイズを判別
する閾値、αは移動平均係数を示す。[Equation 2] Here, P _base (n−1, k) is a noise base, n is a number identifying a frame to be processed, and k is a number identifying a frequency component. Further, Θ _base represents a threshold for discriminating between speech and noise, and α represents a moving average coefficient.

【００７３】第一音声/非音声識別部１０６と第二音声/
非音声識別部１０７は、周波数分割部１０４から出力さ
れた音声スペクトル信号とノイズベース推定部１０５か
ら出力されるノイズベースの値の差が所定の閾値以上で
ある場合、音声成分を含む有音部分と判定し、それ以外
の場合、音声成分を含まない雑音のみの無音部分である
と判定する。First voice / non-voice identification unit 106 and second voice /
When the difference between the voice spectrum signal output from the frequency division unit 104 and the noise base value output from the noise base estimation unit 105 is equal to or greater than a predetermined threshold, the non-voice identification unit 107 includes a voiced part including a voice component. Otherwise, in other cases, it is determined to be a silent part including only noise that does not include a voice component.

【００７４】そして、第一音声/非音声識別部１０６
は、判定結果を第一コムフィルタ生成部１０８に出力
し、第二音声/非音声識別部１０７は、判定結果を第二
コムフィルタ生成部１０９に出力する。Then, the first voice / non-voice discrimination section 106
Outputs the determination result to the first comb filter generation unit 108, and the second voice / non-voice identification unit 107 outputs the determination result to the second comb filter generation unit 109.

【００７５】第一コムフィルタ生成部１０８は、音声ピ
ッチ調波情報を多く抽出するように、第一音声/非音声
識別部１０６の閾値を低く設定して、各周波数成分にお
ける音声成分の有無に基づいて音声ピッチ調波構造を強
調するコムフィルタを生成して、このコムフィルタ結果
をコムフィルタ修正部１１３に出力する。The first comb filter generation unit 108 sets the threshold value of the first voice / non-voice discrimination unit 106 low so as to extract a large amount of voice pitch harmonic information, and determines whether or not there is a voice component in each frequency component. Based on this, a comb filter that emphasizes the voice pitch harmonic structure is generated, and this comb filter result is output to comb filter correction section 113.

【００７６】具体的には、以下の式（３）を用いて第一
コムフィルタCOMB＿low(k)を生成する。Specifically, the first comb filter COMB_low (k) is generated using the following equation (3).

【００７７】[0077]

【数３】ここでΘ_lowは第１コムフィルタ用閾値である。また、
ＨＢは、ＦＦＴ変換長つまり高速フーリエ変換を行うデ
ータ数であり、例えばＨＢ＝５１２とする。[Equation 3] Where Θ _low is the threshold value for the first comb filter. Also,
HB is the FFT transform length, that is, the number of data items to be subjected to the fast Fourier transform, and is set to HB = 512, for example.

【００７８】第二コムフィルタ生成部１０９は、雑音情
報に影響されないように、第二音声/非音声識別部１０
７の閾値を高く設定して、各周波数成分における音声成
分の有無に基づいて音声ピッチ調波構造を修復する基準
コムフィルタとして生成して、このコムフィルタの結果
を有声/無声判別部１１０およびピッチ調波構造修復部
１１２に出力する。具体的には、以下の式（４）を用い
て第二コムフィルタを生成する。The second comb filter generation unit 109 includes a second voice / non-voice discrimination unit 10 so as not to be affected by noise information.
7 is set to a high threshold value, a reference comb filter for restoring the voice pitch harmonic structure is generated based on the presence or absence of a voice component in each frequency component, and the result of the comb filter is generated by the voiced / unvoiced discrimination unit 110 and the pitch. Output to the harmonic structure restoration unit 112. Specifically, the second comb filter is generated using the following equation (4).

【数４】ここでΘ_highは第２コムフィルタ用閾値であり、Θ_high
はΘ_lowより大きい値である。[Equation 4] Where Θ _high is the threshold value for the second comb filter, and Θ _high
Is greater than Θ _low .

【００７９】有声/無声判別部１１０は、第二コムフィ
ルタ生成部１０９から出力された結果に基づいて有声と
無声を判別し、判別結果をピッチ推定部１１１に出力す
る。Voiced / unvoiced discrimination section 110 discriminates between voiced and unvoiced based on the result output from second comb filter generation section 109, and outputs the discrimination result to pitch estimation section 111.

【００８０】具体的には、以下の式（５）と式（６）を
用いて入力音声信号の低周波数領域と高周波数領域に分
けて、第二コムフィルタの通過域（すなわち、COMB＿hi
gh(k)=1）の周波数成分の個数を加算する。Specifically, the passband of the second comb filter (that is, COMB_hi) is divided into the low frequency region and the high frequency region of the input audio signal using the following equations (5) and (6).
Add the number of frequency components of gh (k) = 1).

【００８１】[0081]

【数５】 [Equation 5]

【００８２】[0082]

【数６】ここで、式（５）と式（６）共に設定された閾値より大
きい場合、または式（５）は設定された閾値より大き
く、かつ式（６）は設定された閾値より小さい場合は有
声と判別し、それ以外の場合は無声と判別する。[Equation 6] Here, if both Expressions (5) and (6) are larger than the set threshold value, or if Expression (5) is larger than the set threshold value and Expression (6) is smaller than the set threshold value, it is voiced. If not, it is determined to be unvoiced.

【００８３】有声と判別した場合は音声ピッチの推定と
音声ピッチ調波構造の修復を行い、無声と判別した場合
は音声ピッチの推定と音声ピッチ調波構造の修復を行わ
ない。When it is determined that the voice is present, the voice pitch is estimated and the voice pitch harmonic structure is restored. When it is determined that the voice is unvoiced, the voice pitch is not estimated and the voice pitch harmonic structure is not restored.

【００８４】ピッチ推定部１１１は、周波数分割部１０
４から出力された音声スペクトルから音声ピッチ周期を
推定し、推定結果をピッチ調波構造修復部１１２に出力
する。The pitch estimating section 111 includes a frequency dividing section 10
The speech pitch period is estimated from the speech spectrum output from No. 4 and the estimation result is output to the pitch harmonic structure restoration unit 112.

【００８５】ピッチ調波構造修復部１１２は、ピッチ推
定部１１１から出力された推定結果と第二コムフィルタ
生成部１０９の結果に基づいてピッチの修復を行い、そ
の結果をコムフィルタ修正部１１３に出力する。The pitch harmonic structure restoration unit 112 restores the pitch based on the estimation result output from the pitch estimation unit 111 and the result of the second comb filter generation unit 109, and the result is sent to the comb filter correction unit 113. Output.

【００８６】具体的には、音声ピッチ調波構造の修復は
以下のステップで行う。第１のステップでは、第二コム
フィルタCOMB＿high(k)の通過域毎の音声スペクトルの
パワのピークを抽出し、全ての通過域からピッチ調波構
造修復の基準となるピッチ基準コムフィルタCOMB＿int
(k)を生成する。Specifically, the restoration of the voice pitch harmonic structure is performed in the following steps. In the first step, the power peak of the speech spectrum for each pass band of the second comb filter COMB_high (k) is extracted, and the pitch reference comb filter COMB_int that serves as a reference for the pitch harmonic structure restoration is extracted from all pass bands.
Generate (k).

【００８７】第２のステップでは、ピッチ基準コムフィ
ルタのピークとピークの間隔を計算し、所定の閾値（例
えば１．５倍のピッチ周期）を超えたら、ピッチ推定の
結果に基づいて欠落したピッチ調波の挿入を行い、ピッ
チ調波挿入コムフィルタCOMB＿rec(k)を生成する。In the second step, the peak-to-peak interval of the pitch reference comb filter is calculated, and when a predetermined threshold value (for example, a pitch period of 1.5 times) is exceeded, the missing pitch is determined based on the result of pitch estimation. Harmonics are inserted to generate a pitch harmonic insertion comb filter COMB_rec (k).

【００８８】第３のステップでは、ピッチ周期の値に応
じてピッチ調波挿入コムフィルタの櫛、すなわち通過域
の幅を広くするピッチ調波修復コムフィルタCOMB＿ext
(k)を生成する。In the third step, the comb of the pitch harmonic insertion comb filter, that is, the pitch harmonic restoration comb filter COMB_ext that widens the width of the pass band according to the value of the pitch period.
Generate (k).

【００８９】コムフィルタ修正部１１３は、ピッチ調波
構造修復部１１２から出力された推定結果と第一コムフ
ィルタ生成部１０８から出力された結果を組み合せてコ
ムフィルタの修正を行い、その結果を音声分離係数計算
部１１４に出力する。The comb filter modification unit 113 modifies the comb filter by combining the estimation result output from the pitch harmonic structure repair unit 112 and the result output from the first comb filter generation unit 108, and outputs the result to the voice. It outputs to the separation coefficient calculation unit 114.

【００９０】具体的には、ピッチ調波修復コムフィルタ
COMB＿ext(k)の通過域と第一コムフィルタCOMB＿low(k)
の通過域を比較して、重複する部分があれば、第一コム
フィルタの通過域を修正後のコムフィルタの通過域と
し、それ以外の部分は修正後のコムフィルタの阻止域と
することによって修正後のコムフィルタCOMB＿res(k)を
生成する。Specifically, a pitch harmonic restoration comb filter
COMB_ext (k) passband and first comb filter COMB_low (k)
By comparing the passbands of, and if there is an overlapping part, the passband of the first comb filter is set as the passband of the modified comb filter, and the other parts are set as the stopband of the modified comb filter. A modified comb filter COMB_res (k) is generated.

【００９１】音声分離係数計算部１１４は、コムフィル
タ修正部１１３において生成されたコムフィルタに、周
波数特性に基づいた分離係数を乗算して、周波数成分毎
に入力信号の分離係数の設定を行い、各周波数成分の分
離係数を乗算部１１５に出力する。The voice separation coefficient calculation unit 114 multiplies the comb filter generated in the comb filter correction unit 113 by the separation coefficient based on the frequency characteristic, and sets the separation coefficient of the input signal for each frequency component. The separation coefficient of each frequency component is output to the multiplication unit 115.

【００９２】例えば、以下の式（７）から分離係数seps
(k)を算出して入力信号に乗算することもできる。For example, from the following equation (7), the separation coefficient seps
It is also possible to calculate (k) and multiply the input signal.

【００９３】[0093]

【数７】ここでｇｃは定数、ｋは周波数成分を特定する変数、γ
はノイズベースの減算量を調整する係数である。また、
Ｐ_MAX（ｎ）は、Ｐ_base（ｎ、ｋ）の最大値を示す。ま
た、ｇｃ・Ｐ_MAX（ｎ）/Ｐ_base（ｎ、ｋ）はフレーム毎
にノイズベース推定値の正規化を行い、その逆数を利用
した減衰係数であり、COMB＿res(k)はコムフィルタの修
正結果である。[Equation 7] Where gc is a constant, k is a variable that identifies the frequency component, and γ
Is a coefficient for adjusting the noise-based subtraction amount. Also,
P _MAX (n) indicates the maximum value of P _base (n, k). In addition, gc · P _MAX (n) / P _base (n, k) is an attenuation coefficient that normalizes the noise base estimation value for each frame and uses its reciprocal, and COMB_res (k) is a modification of the comb filter. The result.

【００９４】図２及び図３は本発明の音声強調結果の一
例であり、上記のコムフィルタの生成、ピッチ調波構造
の修復、コムフィルタの修正の各過程および音声分離係
数(減衰係数)を示す。図２は、本実施の形態にかかる音
声強調装置で作成されるコムフィルタの例を示す図であ
る。図２において、縦軸はスペクトルのパワ及び、フィ
ルタの減衰度を示し、横軸は周波数を示す。FIG. 2 and FIG. 3 are examples of the speech enhancement result of the present invention. The steps of generating the comb filter, repairing the pitch harmonic structure, modifying the comb filter and the speech separation coefficient (attenuation coefficient) are shown in FIG. Show. FIG. 2 is a diagram showing an example of a comb filter created by the voice enhancement device according to the present embodiment. In FIG. 2, the vertical axis represents the power of the spectrum and the attenuation of the filter, and the horizontal axis represents the frequency.

【００９５】コムフィルタは、Ｓ１に示す減衰特性を持
ち、減衰特性は、周波数成分毎に設定される。第一コム
フィルタ生成部１０８は、音声成分を含まない周波数領
域の信号を減衰し、音声信号を含む周波数領域の信号を
減衰しない減衰特性のコムフィルタを作成する。The comb filter has the attenuation characteristic shown in S1, and the attenuation characteristic is set for each frequency component. The first comb filter generation unit 108 creates a comb filter having an attenuation characteristic that attenuates a frequency domain signal that does not include a voice component and does not attenuate a frequency domain signal that includes a voice signal.

【００９６】雑音成分を含む音声スペクトルＳ２は、Ｓ
１の減衰特性を持つコムフィルタをかけることにより、
雑音成分を含む周波数領域の信号が減衰されてパワが小
さくなり、音声信号を含む部分は減衰されずパワが変化
しない。得られた音声スペクトルは、雑音成分の周波数
領域がより低くなりピークが失われずに強調されたスペ
クトル形状となり、ピッチ調波情報が失われない雑音を
抑圧した音声スペクトルＳ３が出力される。The speech spectrum S2 including the noise component is S
By applying a comb filter with an attenuation characteristic of 1,
The signal in the frequency domain including the noise component is attenuated to reduce the power, and the portion including the voice signal is not attenuated and the power does not change. The obtained voice spectrum has a spectrum shape in which the frequency region of the noise component becomes lower and the peak is emphasized without being lost, and the voice spectrum S3 in which the noise in which the pitch harmonic information is not suppressed is suppressed is output.

【００９７】図３に、本実施の形態にかかる音声処理装
置におけるコムフィルタの修復の例を示す。図３におい
て、縦軸は減衰度を示し、横軸は、周波数成分を示す。
具体的には、横軸には、２５６の周波数成分があり、０
ｋHzから４ｋHzの領域を示す。FIG. 3 shows an example of restoration of the comb filter in the voice processing device according to this embodiment. In FIG. 3, the vertical axis represents the degree of attenuation and the horizontal axis represents the frequency component.
Specifically, the horizontal axis has 256 frequency components, and
The region from 4 kHz to 4 kHz is shown.

【００９８】Ｃ１は生成されたコムフィルタを、Ｃ２は
コムフィルタＣ１にピッチの修復を行ったコムフィルタ
を、Ｃ３は、コムフィルタＣ２にピッチの幅を修正した
コムフィルタを示す。C1 is a generated comb filter, C2 is a comb filter in which the pitch of the comb filter C1 is restored, and C3 is a comb filter in which the pitch width is corrected in the comb filter C2.

【００９９】コムフィルタＣ１は、１００から１４０ま
での周波数成分でピッチ情報が失われている。ピッチ調
波構造修復部１１２は、ピッチ推定部１１１において推
定されたピッチ周期情報に基づいてコムフィルタＣ１の
１００から１４０までの周波数成分にあるピッチ調波情
報を補う。これによりコムフィルタＣ２が得られる。In the comb filter C1, pitch information is lost in frequency components from 100 to 140. The pitch harmonic structure restoration unit 112 supplements the pitch harmonic information in the frequency components 100 to 140 of the comb filter C1 based on the pitch period information estimated by the pitch estimation unit 111. As a result, the comb filter C2 is obtained.

【０１００】次に、ピッチ調波構造修復部１１２は、周
波数分割部１０４から出力された音声スペクトルに基づ
いてコムフィルタＣ２のピッチ調波の幅を修正する。こ
れによりコムフィルタＣ３が得られる。Next, the pitch harmonic structure restoration section 112 corrects the pitch harmonic width of the comb filter C2 based on the voice spectrum output from the frequency division section 104. As a result, the comb filter C3 is obtained.

【０１０１】乗算部１１５は、周波数分割部１０４から
出力された音声スペクトルに音声分離係数計算部１１４
から出力された分離係数を周波数成分毎に乗算する。そ
して、乗算した結果を音声周波数合成部１１６に出力す
る。The multiplication section 115 adds the speech separation coefficient calculation section 114 to the speech spectrum output from the frequency division section 104.
The separation coefficient output from is multiplied by each frequency component. Then, the result of the multiplication is output to the audio frequency synthesis unit 116.

【０１０２】音声周波数合成部１１６は、乗算部１１５
から出力された各周波数成分のスペクトルを所定の処理
時間単位で周波数領域の連続する音声スペクトルに合成
してＩＦＦＴ部１１７に出力する。The voice frequency synthesizer 116 has a multiplier 115.
The spectrum of each frequency component output from the above is synthesized into a continuous voice spectrum in the frequency domain in a predetermined processing time unit and output to the IFFT unit 117.

【０１０３】ＩＦＦＴ部１１７は、音声周波数合成部１
１６から出力された音声スペクトルにＩＦＦＴ（Invers
e Fast Fourier Transform）を行って音声信号に変
換した信号を出力する。The IFFT unit 117 is a voice frequency synthesis unit 1.
The IFFT (Invers
e Fast Fourier Transform) is performed to output a signal converted into an audio signal.

【０１０４】このように、本実施の形態の音声強調装置
によれば、音声信号のスペクトルから音声ピークである
可能性の高いピークを取り出して音声ピッチを推定する
コムフィルタを作成し、このコムフィルタから正確な音
声ピッチ情報を取得し、また、できる限り多くの音声情
報を取り出してノイズ信号の抑圧を行うコムフィルタを
作成し、このコムフィルタを用いて雑音に埋もれた音声
スペクトルのピークを抑圧しないことにより、正しい音
声ピッチ情報に基づいて欠落した音声ピッチ調波構造を
補ったコムフィルタを作成することができ、このコムフ
ィルタを用いてノイズ信号を抑圧することにより、音声
歪みの少ない音声強調を行うことができる。As described above, according to the voice emphasizing device of this embodiment, a comb filter for estimating a voice pitch by extracting a peak which is likely to be a voice peak from the spectrum of a voice signal is created, and this comb filter is used. We obtain a precise voice pitch information from the voice, and create a comb filter that suppresses the noise signal by extracting as much voice information as possible, and do not suppress the peak of the voice spectrum buried in noise using this comb filter. As a result, it is possible to create a comb filter that compensates for the missing voice pitch harmonic structure based on the correct voice pitch information. By suppressing noise signals using this comb filter, voice enhancement with less voice distortion can be achieved. It can be carried out.

【０１０５】具体的には、本発明の実施の形態の音声強
調装置によれば、周波数成分毎のノイズベースの推定及
び音声/非音声識別を行うことによって周波数領域にお
いて、音声ピッチ調波情報を抽出する第一コムフィルタ
を生成することができる。Specifically, according to the speech emphasizing device of the embodiment of the present invention, the noise pitch estimation and the speech / non-speech discrimination for each frequency component are performed so that the speech pitch harmonic information is obtained in the frequency domain. A first comb filter to extract can be generated.

【０１０６】また、第二コムフィルタはピッチ調波の基
本構造を提供することによって、ピッチ推定誤差による
音声の高調波成分がコムフィルタの通過域からはずれる
ことはない。Further, since the second comb filter provides the basic structure of pitch harmonics, the harmonic component of the voice due to the pitch estimation error does not deviate from the pass band of the comb filter.

【０１０７】また、第二コムフィルタの生成結果に基づ
いて有声/無声の判別を行い、ピッチ調波構造を持つ有
声の場合のみ音声ピッチの推定を行い、その結果に基づ
いて欠落したピッチ調波構造を修復することにより、雑
音に埋もれた音声情報の復元が可能となり、音声ピッチ
調波欠落による音声歪を減少することができる。また、
修正後のコムフィルタの結果に基づいて周波数成分毎に
音声スペクトルの減衰を行うかどうかを決めるので、減
衰を大きくしても音声歪の少ない音声強調を行うことが
できる。Further, voiced / unvoiced discrimination is performed based on the generation result of the second comb filter, the voice pitch is estimated only in the case of voiced voice having a pitch harmonic structure, and the missing pitch harmonic is estimated based on the result. By repairing the structure, it becomes possible to restore voice information buried in noise, and reduce voice distortion due to voice pitch harmonic loss. Also,
Since whether or not to attenuate the voice spectrum is determined for each frequency component based on the result of the modified comb filter, it is possible to perform voice enhancement with less voice distortion even if the attenuation is increased.

【０１０８】また、第一音声/非音声識別閾値を低く設
定して第一コムフィルタを生成することによって、音声
情報をより多く抽出することができる。一方、第二音声
/非音声識別閾値を高く設定して第二コムフィルタを生
成することによって、雑音情報の影響を受け難いコムフ
ィルタを生成することができる。その結果に基づいて正
確に音声ピッチ調波構造の修復を行うことができる。Further, by setting the first voice / non-voice discrimination threshold to be low and generating the first comb filter, more voice information can be extracted. Meanwhile, the second voice
/ By setting the non-voice discrimination threshold to a high value and generating the second comb filter, it is possible to generate a comb filter that is not easily affected by noise information. Based on the result, the voice pitch harmonic structure can be accurately restored.

【０１０９】また、第二コムフィルタの生成結果に基づ
いて有声と無声の判別を行うことによって、少ない演算
量で簡単に有声と無声を判別することができる。また、
無声区間の場合は音声ピッチ推定と音声ピッチ調波構造
の修復を行わないことによって、ピッチ調波構造を持た
ない無声区間にも対応できる。Further, by distinguishing between voiced and unvoiced based on the result of the second comb filter generation, it is possible to easily distinguish between voiced and unvoiced with a small amount of calculation. Also,
In the case of the unvoiced section, the unvoiced section without the pitch harmonic structure can be dealt with by not performing the speech pitch estimation and the restoration of the speech pitch harmonic structure.

【０１１０】また、音声ピッチ推定結果に基づいてピッ
チ調波を挿入することによって、音声ピッチ調波を修復
することができる。また、ピッチ調波の幅はピッチの推
定結果によって自動的に調整することによって、音声ピ
ッチ推定誤差による影響を低減し、より確実に音声ピッ
チ調波構造を修復することができる。更に、音声ピッチ
調波構造修復の結果と第一コムフィルタの結果を比較
し、重複する部分があれば、第一コムフィルタの通過域
を修正後のコムフィルタの通過域とし、それ以外は修正
後のコムフィルタの阻止域とすることによって、音声ピ
ッチ調波情報のみ抽出し、ピッチ調波間のノイズ情報を
抑圧することができる。Also, the voice pitch harmonic can be restored by inserting the pitch harmonic based on the voice pitch estimation result. Further, the width of the pitch harmonic is automatically adjusted according to the pitch estimation result, so that the influence of the voice pitch estimation error can be reduced and the voice pitch harmonic structure can be more reliably restored. Furthermore, the results of voice pitch harmonic structure restoration and the results of the first comb filter are compared. If there is an overlapping part, the pass band of the first comb filter is set as the corrected pass band of the comb filter, and the others are corrected. By setting it as the stop band of the subsequent comb filter, only the voice pitch harmonic information can be extracted and the noise information between the pitch harmonics can be suppressed.

【０１１１】（実施の形態２）図４は実施の形態２にか
かる音声強調装置の構成の例を示すブロック図である。
但し、図１と共通する構成については図１と同一番号を
付し、詳しい説明を省略する。(Second Embodiment) FIG. 4 is a block diagram showing an example of the configuration of a voice emphasizing device according to the second embodiment.
However, the same components as those in FIG. 1 are denoted by the same reference numerals as those in FIG. 1, and detailed description thereof will be omitted.

【０１１２】図４の音声強調装置３００は、音声／雑音
フレーム検出部３０１を具備し、第一コムフィルタと第
二コムフィルタとから音声スペクトルに音声成分が含ま
れているか否か判定し、この判定が音声成分を含まない
結果である場合、前記第一コムフィルタに各周波数成分
で信号を減衰する修正をする点が図１の音声強調装置と
異なる。The speech emphasizing apparatus 300 of FIG. 4 comprises a speech / noise frame detector 301, judges whether or not the speech spectrum contains a speech component from the first comb filter and the second comb filter, and When the determination result is that the voice component is not included, the point that the first comb filter is modified to attenuate the signal at each frequency component is different from the voice enhancement apparatus of FIG.

【０１１３】具体的には、図４の音声強調装置３００
は、第一コムフィルタの通過域における入力音声パワス
ペクトルの和と第一コムフィルタの阻止域における入力
音声パワスペクトルの和の比を第１結果とし、第二コム
フィルタの通過域における入力音声パワスペクトルの和
と第二コムフィルタの阻止域における入力音声パワスペ
クトルの和の比を第２結果として、それが所定の閾値よ
り大きい場合は第１結果と第２結果を加算し、所定の閾
値より小さい場合は第２結果を用いることによって音声
／雑音フレームを検出する点が、図１の音声強調装置と
異なる。Specifically, the voice emphasizing device 300 shown in FIG.
Is the ratio of the sum of the input voice power spectrum in the pass band of the first comb filter and the sum of the input voice power spectrum in the stop band of the first comb filter as the first result, and the input voice power in the pass band of the second comb filter is If the ratio of the sum of the spectrum and the sum of the input speech power spectrum in the stop band of the second comb filter is the second result, and if it is larger than a predetermined threshold value, the first result and the second result are added and When it is small, the speech / noise frame is detected by using the second result, which is different from the speech enhancement apparatus of FIG.

【０１１４】図４において、第一コムフィルタ生成部１
０８と第二コムフィルタ生成部１０９から出力された結
果および入力音声パワスペクトルを音声／雑音フレーム
検出部３０１に入力し、音声／雑音フレーム検出部３０
１で計算した音声／雑音フレーム検出結果をコムフィル
タ修正部１１３に出力する。In FIG. 4, the first comb filter generator 1
08 and the result output from the second comb filter generation unit 109 and the input voice power spectrum are input to the voice / noise frame detection unit 301, and the voice / noise frame detection unit 30 is input.
The speech / noise frame detection result calculated in 1 is output to the comb filter correction unit 113.

【０１１５】具体的には、まず、以下の式（８）と式
（９）を用いて第一コムフィルタと第二コムフィルタに
基づく音声と雑音のＳＮ比を計算する。Specifically, first, the SN ratio of voice and noise based on the first comb filter and the second comb filter is calculated using the following formulas (8) and (9).

【０１１６】[0116]

【数８】 [Equation 8]

【０１１７】[0117]

【数９】次に、以下の式（１０）により当該フレームのＳＮ比
（SNR＿frame(n)）を算出する。[Equation 9] Next, the SN ratio (SNR_frame (n)) of the frame is calculated by the following equation (10).

【０１１８】[0118]

【数１０】ここで、Θ_snは閾値である。そして、SNR＿frame(n)と
Θ_snの比較により音声／雑音フレームの検出を行う。音
声/雑音フレームの検出結果が雑音フレーム（すなわ
ち、SNR＿frame(n)＜Θ_sn）であれば、修正コムフィル
タCOMB＿res(k)の各周波数成分をすべて阻止域にする。[Equation 10] Where Θ _sn is a threshold. Then, the speech / noise frame is detected by comparing SNR_frame (n) with Θ _sn . If the detection result of the voice / noise frame is a noise frame (that is, SNR_frame (n) <Θ _sn ), all the frequency components of the modified comb filter COMB_res (k) are set to the stop band.

【０１１９】このように、本実施の形態の音声強調装置
によれば、第一コムフィルタと第二コムフィルタとから
音声スペクトルに音声成分が含まれているか否か判定
し、この判定が音声成分を含まない結果である場合、前
記第一コムフィルタに各周波数成分で信号を減衰する修
正をすることにより、突発的に発生するノイズを抑圧す
ることができる。As described above, according to the voice emphasizing device of the present embodiment, it is determined from the first comb filter and the second comb filter whether or not the voice spectrum includes a voice component, and this determination is a voice component. In the case where the result does not include the above, the first comb filter can be modified so as to attenuate the signal at each frequency component, so that noise that occurs suddenly can be suppressed.

【０１２０】具体的には、本実施の形態の音声強調装置
によれば、音声/非音声識別閾値の低い第一コムフィル
タの結果に基づいて音声とノイズのＳＮ比を計算するこ
とによって、音声と雑音をより検出しやすくなる。一
方、音声/非音声識別閾値の高い第二コムフィルタの結
果に基づいて音声と雑音のＳＮ比を計算することによっ
て、突発ノイズによる誤検出を少なくすることができ
る。上記音声/雑音フレーム検出を用いれば、両方の長
所を活かすことができ、より確実に音声/雑音フレーム
を検出し、突発ノイズによる音声／雑音フレームの検出
への影響を最小限にすることができる。Specifically, according to the voice emphasizing device of the present embodiment, the SN ratio of voice and noise is calculated based on the result of the first comb filter having a low voice / non-voice discrimination threshold. And noise becomes easier to detect. On the other hand, by calculating the SN ratio of voice and noise based on the result of the second comb filter having a high voice / non-voice discrimination threshold, it is possible to reduce false detection due to sudden noise. By using the above voice / noise frame detection, both advantages can be utilized, the voice / noise frame can be detected more reliably, and the influence of sudden noise on the detection of the voice / noise frame can be minimized. .

【０１２１】（実施の形態３）図５は実施の形態３にか
かる音声強調装置の構成の例を示すブロック図である。
但し、図１と共通する構成については図１と同一番号を
付し、詳しい説明を省略する。(Third Embodiment) FIG. 5 is a block diagram showing an example of the configuration of a voice emphasizing device according to the third embodiment.
However, the same components as those in FIG. 1 are denoted by the same reference numerals as those in FIG.

【０１２２】図５の音声強調装置４００は、局部最小値
計算部４０１を具備し、入力信号のパワスペクトルの極
小値から所定の範囲を阻止域とするコムフィルタを作成
する点が図１の音声強調装置と異なる。The speech emphasizing apparatus 400 of FIG. 5 is provided with a local minimum value calculating section 401, and a point of creating a comb filter having a predetermined range from the minimum value of the power spectrum of the input signal as a comb filter is shown in FIG. Different from the highlighter.

【０１２３】具体的には、図５の音声強調装置４００
は、所定の周波数領域において、入力音声パワスペクト
ルの局部最小値を第一コムフィルタの阻止域とし、それ
以外の周波数成分を通過域とすることによって第一コム
フィルタを生成する点が、図１の音声強調装置と異な
る。Specifically, the voice emphasizing device 400 of FIG.
1 is that, in a predetermined frequency region, the local minimum value of the input speech power spectrum is used as the stop band of the first comb filter, and the other frequency components are used as the pass band to generate the first comb filter. Different from the voice enhancement device.

【０１２４】図５において、入力音声スペクトルを局部
最小値計算部４０１に入力し、局部最小値計算部４０１
の出力は第一音声/非音声識別部１０６により設定され
た閾値と共に第一コムフィルタ生成部１０８に入力し、
第一コムフィルタ生成部１０８の結果をコムフィルタ修
正部１１３に出力する。In FIG. 5, the input speech spectrum is input to the local minimum value calculation unit 401, and the local minimum value calculation unit 401 is input.
Is output to the first comb filter generation unit 108 together with the threshold value set by the first voice / non-voice identification unit 106,
The result of the first comb filter generation unit 108 is output to the comb filter correction unit 113.

【０１２５】局部最小値計算部４０１は入力された音声
パワスペクトルに基づいて、所定の周波数領域における
局部最小値に位置する周波数成分を第一コムフィルタの
通過域と阻止域の変換点の周波数成分とする。具体的に
は、所定の周波数領域において、以下の手順で第一コム
フィルタを生成する。The local minimum value calculation unit 401 determines the frequency component located at the local minimum value in a predetermined frequency region based on the input voice power spectrum, as the frequency components at the conversion points of the pass band and stop band of the first comb filter. And Specifically, in the predetermined frequency domain, the first comb filter is generated by the following procedure.

【０１２６】周波数成分毎に分割された音声スペクトル
Ｓ_f（ｋ）（ただし、ｋは周波数成分を特定する番号）
について、隣接する周波数成分Ｓ_f（ｋ−１）及びＳ
_f（ｋ＋１）の両方のパワより小さい場合、第一コムフ
ィルタを阻止域とし、上記条件を満たさない場合、第一
コムフィルタを通過域とする。所定の周波数領域以外の
周波数領域に対しては、実施の形態１と同じ手段で第一
コムフィルタを生成する。Speech spectrum S _f (k) divided for each frequency component (where k is a number identifying the frequency component)
For adjacent frequency components S _f (k−1) and S
_If both powers of _f (k + 1) are smaller than the above, the first comb filter is set as a stop band, and if the above conditions are not satisfied, the first comb filter is set as a pass band. For frequency regions other than the predetermined frequency region, the first comb filter is generated by the same means as in the first embodiment.

【０１２７】このように、本実施の形態の音声強調装置
によれば、入力信号のパワスペクトルの極小値から所定
の範囲を阻止域とするコムフィルタを作成し、このコム
フィルタを用いて入力信号を抑圧することにより、音声
とノイズのレベル差が少ない場合でも、音声ピッチ調波
構造を抽出して修復して音声歪を低減することができ
る。As described above, according to the voice emphasizing device of the present embodiment, a comb filter having a predetermined range from the minimum value of the power spectrum of the input signal as a stop band is created, and the input signal is output using this comb filter. By suppressing the noise, even if the level difference between the voice and the noise is small, the voice pitch harmonic structure can be extracted and restored to reduce the voice distortion.

【０１２８】具体的には、本実施の形態の音声強調装置
によれば、所定の周波数領域（特に、低域の周波数帯域
帯域）において、局部最小値を抽出し、局部最小値の周
波数成分を第一コムフィルタの阻止域とし、それ以外の
周波数成分を通過域とすることによって、音声がノイズ
に埋もれやすい低ＳＮ比の環境下でも音声ピッチ調波構
造をより確実に抽出して修復することができ、音声ピッ
チ調波構造の欠落による音声歪を低減することができ
る。Specifically, according to the voice emphasizing device of the present embodiment, the local minimum value is extracted in a predetermined frequency region (particularly, the low frequency band), and the frequency component of the local minimum value is extracted. By making the stop band of the first comb filter and the other frequency components the pass band, it is possible to more reliably extract and restore the voice pitch harmonic structure even in a low SN ratio environment where voice is easily buried in noise. Therefore, it is possible to reduce the voice distortion due to the lack of the voice pitch harmonic structure.

【０１２９】（実施の形態４）図６は、実施の形態４に
かかる音声強調装置の構成の例を示すブロック図であ
る。但し、図１と共通する構成については図１と同一番
号を付し、詳しい説明を省略する。(Embodiment 4) FIG. 6 is a block diagram showing an example of the configuration of a voice emphasizing device according to Embodiment 4. In FIG. However, the same components as those in FIG. 1 are denoted by the same reference numerals as those in FIG. 1, and detailed description thereof will be omitted.

【０１３０】図６の音声強調装置５００は、ノイズベー
ス減算部５０１を具備し、入力信号のパワスペクトルか
らノイズベースを減算し、減算結果の自己相関関数に基
づいて音声ピッチを推定し、推定した音声ピッチで第一
コムフィルタに含まれる音声ピッチ調波構造を修正する
点が図１の音声強調装置と異なる。The speech emphasizing apparatus 500 of FIG. 6 comprises a noise base subtraction unit 501, subtracts the noise base from the power spectrum of the input signal, and estimates and estimates the speech pitch based on the autocorrelation function of the subtraction result. The point that the voice pitch harmonic structure included in the first comb filter is corrected by the voice pitch is different from the voice enhancing apparatus of FIG.

【０１３１】具体的には、図６の音声強調装置５００
は、入力音声のパワスペクトルからノイズベースを減算
して自己相関関数を計算し、周波数領域で入力音声のパ
ワスペクトルの自己相関関数に基づいてピッチ推定値を
計算する方法を採る点が、図１の音声強調装置と異な
る。Specifically, the voice emphasizing device 500 shown in FIG.
1 adopts a method of subtracting a noise base from the power spectrum of the input speech to calculate an autocorrelation function, and calculating a pitch estimation value based on the autocorrelation function of the power spectrum of the input speech in the frequency domain. Different from the voice enhancement device.

【０１３２】図６において、入力音声パワスペクトルと
ノイズベース推定部１０５により推定されたノイズベー
スの推定値をノイズベース減算部５０１に入力し、ノイ
ズベース減算部５０１は入力音声パワスペクトルからノ
イズベースの推定値を減算し、その結果をピッチ推定部
１１１に入力する。また、有声/無声判別部１１０から
出力されたピッチ推定を行うかどうかの信号もピッチ推
定部１１１に入力する。ピッチ推定部１１１で推定され
たピッチ周期はピッチ調波構造修復部１１２に出力す
る。In FIG. 6, the input speech power spectrum and the noise base estimation value estimated by the noise base estimation unit 105 are input to the noise base subtraction unit 501, and the noise base subtraction unit 501 extracts the noise base from the input speech power spectrum. The estimated value is subtracted and the result is input to pitch estimation section 111. Further, the signal output from the voiced / unvoiced discrimination unit 110 indicating whether or not to perform pitch estimation is also input to the pitch estimation unit 111. The pitch period estimated by the pitch estimation unit 111 is output to the pitch harmonic structure restoration unit 112.

【０１３３】ピッチ推定部１１１は入力音声パワスペク
トルからノイズベースを減算した結果を用いて自己相関
関数を計算し、自己相関関数の最大値に対応する遅延を
ピッチ周期とする。Pitch estimating section 111 calculates an autocorrelation function using the result of subtracting the noise base from the input speech power spectrum, and sets the delay corresponding to the maximum value of the autocorrelation function as the pitch period.

【０１３４】具体的には、以下の式（１１）を用いて入
力音声パワスペクトルからノイズベースを減算し、式
（１２）を用いて自己相関関数を計算する。Specifically, the noise base is subtracted from the input speech power spectrum using the following equation (11), and the autocorrelation function is calculated using equation (12).

【０１３５】[0135]

【数１１】 [Equation 11]

【０１３６】[0136]

【数１２】ここでＫ_Mは周波数の上限である。式（１２）で計算さ
れた自己相関関数の最大値に対応するτをピッチ周期と
する。[Equation 12] Here, K _M is the upper limit of the frequency. Let τ corresponding to the maximum value of the autocorrelation function calculated by equation (12) be the pitch period.

【０１３７】このように、本実施の形態の音声強調装置
によれば、入力信号のパワスペクトルからノイズベース
を減算し、減算結果の自己相関関数に基づいて音声ピッ
チを推定し、推定した音声ピッチで第一コムフィルタに
含まれる音声ピッチ調波構造を修正することにより、ピ
ッチ調波構造の修復を行うことができ、音声歪みの少な
い音声強調を行うことができる。As described above, according to the voice emphasizing apparatus of the present embodiment, the noise base is subtracted from the power spectrum of the input signal, the voice pitch is estimated based on the autocorrelation function of the subtraction result, and the estimated voice pitch is obtained. By correcting the voice pitch harmonic structure included in the first comb filter, the pitch harmonic structure can be restored, and voice enhancement with less voice distortion can be performed.

【０１３８】具体的には、本実施の形態の音声強調装置
によれば、ピッチ推定部１１１は入力音声パワスペクト
ルからノイズベースを減算した結果を用いて自己相関関
数を計算することによって、ノイズによるピッチ推定誤
差を減少し、より正確にピッチ調波構造の修復を行うこ
とができる。Specifically, according to the speech emphasizing apparatus of this embodiment, the pitch estimating unit 111 calculates the autocorrelation function using the result of subtracting the noise base from the input speech power spectrum, and The pitch estimation error can be reduced, and the pitch harmonic structure can be more accurately restored.

【０１３９】（実施の形態５）図７は、実施の形態５に
かかる音声強調装置の構成の例を示すブロック図であ
る。但し、図１と共通する構成については図１と同一番
号を付し、詳しい説明を省略する。(Fifth Embodiment) FIG. 7 is a block diagram showing an example of the configuration of a voice emphasizing device according to a fifth embodiment. However, the same components as those in FIG. 1 are denoted by the same reference numerals as those in FIG. 1, and detailed description thereof will be omitted.

【０１４０】図７の音声強調装置６００は、入力信号の
パワスペクトルからノイズベースを減算した結果に直流
成分に所定のパワを持つ擬似ピークを生成し、直流成分
を生成したスペクトルの自己相関関数に基づいて音声ピ
ッチを推定し、推定した音声ピッチでコムフィルタの音
声ピッチを修正する点が図１の音声強調装置と異なる。The speech emphasizing apparatus 600 of FIG. 7 generates a pseudo peak having a predetermined power in the DC component as a result of subtracting the noise base from the power spectrum of the input signal, and calculates the DC component as an autocorrelation function of the generated spectrum. 1 is different in that the voice pitch is estimated based on the estimated voice pitch and the voice pitch of the comb filter is corrected with the estimated voice pitch.

【０１４１】具体的には、図７の音声強調装置６００
は、直流成分生成部６０１を具備し、自己相関関数を計
算するとき、適当なエネルギを持つ擬似パワスペクトル
を直流成分として生成し、それに基づいて自己相関関数
を計算する点が図１の音声強調装置と異なる。Specifically, the voice emphasizing device 600 shown in FIG.
1 includes a DC component generator 601. When calculating an autocorrelation function, a pseudo power spectrum having appropriate energy is generated as a DC component, and the autocorrelation function is calculated based on the generated DC spectrum. Different from the device.

【０１４２】図７において、直流成分生成部６０１は直
流成分に適当なエネルギを持つパワスペクトルを生成
し、ピッチ推定部１１１に入力する。また、有声/無声
判別部１１０によりピッチ推定を行うかどうかの信号も
ピッチ推定部１１１に入力する。ピッチ推定部１１１で
推定されたピッチ周期はピッチ調波構造修復部１１２に
出力する。In FIG. 7, the DC component generator 601 generates a power spectrum having an appropriate energy for the DC component and inputs it to the pitch estimator 111. Further, a signal indicating whether or not pitch estimation is performed by the voiced / unvoiced discrimination unit 110 is also input to the pitch estimation unit 111. The pitch period estimated by the pitch estimation unit 111 is output to the pitch harmonic structure restoration unit 112.

【０１４３】具体的には、第一音声ピッチ調波のパワス
ペクトルと同じエネルギを持つ擬似パワスペクトルを直
流成分として付加した入力スペクトルを用いて自己相関
関数を計算し、その結果に基づいてピッチ周期を推定す
る。Specifically, an autocorrelation function is calculated using an input spectrum in which a pseudo power spectrum having the same energy as the power spectrum of the first voice pitch harmonic is added as a DC component, and the pitch period is calculated based on the result. To estimate.

【０１４４】このように、本実施の形態の音声強調装置
によれば、入力信号のパワスペクトルからノイズベース
を減算した結果に直流成分に所定のパワを持つ擬似ピー
クを生成し、直流成分を生成したスペクトルの自己相関
関数に基づいて音声ピッチを推定し、推定した音声ピッ
チでピッチ調波構造を修復することにより、ピッチ調波
の一部がノイズに埋もれても、ピッチ情報を得てピッチ
調波構造の修復を行うことができ、音声歪みの少ない音
声強調を行うことができる。As described above, according to the voice emphasizing device of the present embodiment, a pseudo peak having a predetermined power in the DC component is generated as a result of subtracting the noise base from the power spectrum of the input signal, and the DC component is generated. By estimating the voice pitch based on the autocorrelation function of the spectrum, and restoring the pitch harmonic structure with the estimated voice pitch, even if a part of the pitch harmonic is buried in noise, the pitch information is obtained and the pitch tuning is performed. The wave structure can be restored, and the voice enhancement with less voice distortion can be performed.

【０１４５】具体的には、本実施の形態の音声強調装置
によれば、入力音声信号は直流成分が含まれなくても、
周波数領域において、直流成分はピッチ調波の基点とし
て利用することができる。直流成分に適当なエネルギを
持つ擬似パワスペクトルを生成することによって、参照
となるピッチ調波は一つが増える。それを利用して自己
相関関数を計算すれば、より正確にピッチ推定を行うこ
とができる。特に第一音声ピッチ調波のエネルギが大き
く高次ピッチ調波のエネルギが小さい場合、あるいは低
周波数領域における雑音レベルが高いときにピッチ推定
の誤差を減少するために非常に有効である。Specifically, according to the voice emphasizing device of the present embodiment, even if the input voice signal does not include a DC component,
In the frequency domain, the DC component can be used as a base point for pitch harmonics. By generating a pseudo power spectrum having an appropriate energy for a DC component, one reference pitch harmonic is increased. If the autocorrelation function is calculated using this, pitch estimation can be performed more accurately. Particularly, it is very effective for reducing the pitch estimation error when the energy of the first voice pitch harmonic is large and the energy of the higher order pitch harmonic is small, or when the noise level in the low frequency region is high.

【０１４６】なお、実施の形態５は、実施の形態４と組
み合せることができる。すなわち、図７の音声強調装置
に実施の形態４に記載のノイズベース減算部５０１を用
いれば、実施の形態４の効果も得ることができる。The fifth embodiment can be combined with the fourth embodiment. That is, if the noise-based subtraction unit 501 described in the fourth embodiment is used in the voice enhancement device in FIG. 7, the effect of the fourth embodiment can be obtained.

【０１４７】（実施の形態６）図８は実施の形態６にか
かる音声強調装置の構成の例を示すブロック図である。
但し、図１または図４と共通する構成については図１ま
たは図４と同一番号を付し、詳しい説明を省略する。(Sixth Embodiment) FIG. 8 is a block diagram showing an example of the configuration of a voice emphasizing device according to the sixth embodiment.
However, the same components as those in FIG. 1 or 4 are denoted by the same reference numerals as those in FIG.

【０１４８】図８の音声強調装置７００は、第三音声/
非音声識別部７０１と、第三コムフィルタ生成部７０２
と、ノイズ特性推定部７０３とを具備し、入力信号のノ
イズ分散値を推定し、第三コムフィルタの通過域におけ
る周波数成分の個数の移動平均を計算し、その値が大き
い場合はノイズの分散値が大きいと判断し、逆にノイズ
の分散値が小さいと判断し、その結果に基づいて第二コ
ムフィルタを生成するときの音声/非音声識別閾値を設
定する点が、図１の音声強調装置と異なる。The voice emphasizing device 700 of FIG.
Non-voice identification unit 701 and third comb filter generation unit 702
And a noise characteristic estimation unit 703, estimate the noise variance value of the input signal, calculate the moving average of the number of frequency components in the pass band of the third comb filter, and if the value is large, the noise variance It is judged that the value is large, and conversely it is judged that the noise variance value is small, and based on the result, the voice / non-voice discrimination threshold when the second comb filter is generated is set. Different from the device.

【０１４９】具体的には、図８の音声強調装置７００
は、ノイズの特性を推定する第三コムフィルタを周波数
領域で生成し、雑音フレームにおいて、第三コムフィル
タの通過域の周波数成分の個数を加算し、その移動平均
値に基づいて第二音声/非音声の識別閾値を決定する点
が、図１の音声強調装置と異なる。Specifically, the voice emphasizing device 700 shown in FIG.
Generates a third comb filter that estimates the characteristics of noise in the frequency domain, adds the number of frequency components in the passband of the third comb filter in the noise frame, and then outputs the second voice / The difference from the voice emphasizing device in FIG. 1 is that the non-voice discrimination threshold is determined.

【０１５０】第三音声/非音声識別部７０１は、周波数
分割部１０４から出力された音声スペクトル信号とノイ
ズベース推定部１０５から出力されるノイズベースの値
の差が所定の閾値以上である場合、通過域と判定し、そ
れ以外の場合、阻止域と判定する。判定結果を第三コム
フィルタ生成部７０２に出力する。The third speech / non-speech discrimination section 701, when the difference between the speech spectrum signal output from the frequency division section 104 and the noise base value output from the noise base estimation section 105 is equal to or larger than a predetermined threshold value, It is determined to be the pass band, and otherwise it is determined to be the stop band. The determination result is output to the third comb filter generation unit 702.

【０１５１】第三コムフィルタ生成部７０２は、第三音
声/非音声識別部７０１から出力された音声/非音声識別
結果に基づいてコムフィルタの通過域/阻止域を生成
し、その結果をノイズ特性推定部７０３に出力する。ノ
イズ特性推定部７０３は、音声/雑音フレーム検出部３
０１から検出された雑音フレームにおいて、第三コムフ
ィルタの通過域における周波数成分の個数を加算し、所
定のフレーム数に渡って平均値を計算し、その結果を第
二音声/非音声識別部１０７に出力する。具体的には、
以下の式（１３）を用いてノイズ特性の推定を行う。The third comb filter generation unit 702 generates a pass band / stop band of the comb filter based on the voice / non-voice discrimination result output from the third voice / non-voice discrimination unit 701, and outputs the result as noise. It is output to the characteristic estimation unit 703. The noise characteristic estimation unit 703 uses the speech / noise frame detection unit 3
In the noise frame detected from 01, the number of frequency components in the pass band of the third comb filter is added, the average value is calculated over a predetermined number of frames, and the result is calculated by the second voice / non-voice discriminating unit 107. Output to. In particular,
The noise characteristic is estimated using the following equation (13).

【０１５２】[0152]

【数１３】ここで、COMB＿var(k)は第三コムフィルタ、NS＿var(n)
はノイズ特性の推定結果、α_Vは移動平均係数である。[Equation 13] Where COMB_var (k) is the third comb filter, NS_var (n)
Is a noise characteristic estimation result, and α _V is a moving average coefficient.

【０１５３】そして、第二音声/非音声識別閾値をNS＿v
ar(n)で適応的に制御するようにし、NS＿var(n)の値が
大きければ、雑音の特性として分散が大きいと判断し、
第二音声/非音声識別閾値を高く設定し、逆に、その値
が小さければ、雑音の特性として分散が小さいと判断
し、第二音声/非音声識別閾値を低く設定する。Then, the second voice / non-voice discrimination threshold is NS_v
ar (n) is adaptively controlled, and if the value of NS_var (n) is large, it is determined that the variance is large as a noise characteristic,
The second voice / non-voice discrimination threshold is set high, and conversely, if the value is small, it is determined that the variance of the noise characteristics is small, and the second voice / non-voice discrimination threshold is set low.

【０１５４】このように、本実施の形態の音声強調装置
によれば、入力信号のノイズレベルの分布を検出し、こ
の分布に基づいて音声スペクトルからコムフィルタを生
成する基準を決定し、ノイズの種類に対応して雑音を抑
圧することができ、音声歪みの少ない音声強調を行うこ
とができる。As described above, according to the voice emphasizing device of the present embodiment, the distribution of the noise level of the input signal is detected, and the reference for generating the comb filter is determined from the voice spectrum based on this distribution to determine the noise level. Noise can be suppressed according to the type, and speech enhancement with less speech distortion can be performed.

【０１５５】具体的には、本実施の形態の音声強調装置
によれば、ノイズの特性を推定するための第三コムフィ
ルタを設けることによって、簡単な計算で間接的にノイ
ズの分散値を推定することができ、その結果に基づいて
第二音声/非音声識別閾値を設定すれば、第二コムフィ
ルタを生成するとき、分散値の大きいノイズによる偽の
ピッチ調波の混入を減少することができる。また、分散
値の小さいノイズの場合は、より多くの音声ピッチ調波
情報を残すことができる。Specifically, according to the voice emphasizing device of the present embodiment, the third comb filter for estimating the noise characteristic is provided to indirectly estimate the noise variance value by a simple calculation. By setting the second voice / non-voice discrimination threshold based on the result, it is possible to reduce the mixing of false pitch harmonics due to noise with large variance when generating the second comb filter. it can. Further, in the case of noise having a small variance value, more voice pitch harmonic information can be left.

【０１５６】（実施の形態７）図９は、実施の形態７に
かかる音声強調装置の構成の例を示すブロック図であ
る。但し、図１及び図４と共通する構成については図１
及び図４と同一番号を付し、詳しい説明を省略する。(Seventh Embodiment) FIG. 9 is a block diagram showing an example of the configuration of a voice emphasizing device according to a seventh embodiment. However, the configuration common to FIG. 1 and FIG.
4, and the same numbers as in FIG. 4 are given, and detailed description is omitted.

【０１５７】図９の音声強調装置８００は、周波数領域
選択部８０１を具備し、音声ピッチ調波構造の修復を行
うとき、第三コムフィルタの結果に基づいて第二コムフ
ィルタの所定の周波数領域における周波数成分をすべて
阻止域に変換する点が、図１の音声強調装置と異なる。The speech emphasizing apparatus 800 of FIG. 9 is equipped with a frequency domain selection unit 801, and when the speech pitch harmonic structure is restored, a predetermined frequency domain of the second comb filter is obtained based on the result of the third comb filter. The point that all the frequency components in are converted to the stop band is different from the speech enhancement apparatus of FIG.

【０１５８】図９において、ノイズ特性推定部７０３
は、第三コムフィルタ生成部７０２から出力された結果
に基づいてノイズの特性を推定し、その結果を周波数領
域選択部８０１に出力する。周波数領域選択部８０１
は、ノイズ特性の推定結果に基づいて第二コムフィルタ
を阻止域にする中間周波数領域を決定し、その結果を第
二コムフィルタ生成部１０９に出力する。In FIG. 9, noise characteristic estimating section 703
Estimates the noise characteristic based on the result output from the third comb filter generation unit 702, and outputs the result to the frequency domain selection unit 801. Frequency domain selection unit 801
Determines the intermediate frequency region in which the second comb filter is in the stop band based on the noise characteristic estimation result, and outputs the result to the second comb filter generation unit 109.

【０１５９】具体的には、ノイズ特性推定部７０３によ
り計算されたノイズ特性の移動平均値がある閾値を越え
たら、分散値の大きいノイズと判断し、第二コムフィル
タの中間周波数領域、例えば１ｋHz〜２ｋHz間の周波数
成分をすべて阻止域に変換する。Specifically, if the moving average value of the noise characteristic calculated by the noise characteristic estimating unit 703 exceeds a certain threshold, it is determined that the noise has a large variance value, and the intermediate frequency region of the second comb filter, for example, 1 kHz. All frequency components between 2 kHz are converted to the stop band.

【０１６０】このように、本実施の形態の音声強調装置
によれば、ノイズ特性の推定結果に基づいて周波数成分
を選択し、第二コムフィルタにおいて、選択された周波
数領域をすべて阻止域に変換することによって、分散値
の大きいノイズにより生成した偽のピッチ調波を減少
し、偽のピッチ調波を生じにくい低周波数領域における
ピッチ調波を基準に、実施の形態１に記載したピッチ調
波構造の修復を行えば、正確にピッチ調波構造を修復す
ることができる。As described above, according to the voice emphasizing device of the present embodiment, the frequency component is selected based on the estimation result of the noise characteristic, and the second comb filter converts all the selected frequency region into the stop band. By doing so, false pitch harmonics generated by noise having a large variance value are reduced, and the pitch harmonics described in the first embodiment are based on the pitch harmonics in the low frequency region in which false pitch harmonics are less likely to occur. If the structure is repaired, the pitch harmonic structure can be accurately repaired.

【０１６１】なお、実施の形態７は、実施の形態６と組
み合せることができる。すなわち、図９の音声強調装置
に実施の形態６に記載のノイズ特性推定部７０３を用い
れば、実施の形態６の効果も得ることができる。The seventh embodiment can be combined with the sixth embodiment. That is, if the noise characteristic estimation unit 703 described in the sixth embodiment is used in the speech enhancement apparatus of FIG. 9, the effect of the sixth embodiment can be obtained.

【０１６２】（実施の形態８）図１０は、実施の形態８
にかかる音声強調装置の構成の例を示すブロック図であ
る。但し、図１と共通する構成については図１と同一番
号を付し、詳しい説明を省略する。(Embodiment 8) FIG. 10 shows Embodiment 8.
It is a block diagram showing an example of composition of a voice emphasis device concerning. However, the same components as those in FIG. 1 are denoted by the same reference numerals as those in FIG. 1, and detailed description thereof will be omitted.

【０１６３】図１０の音声強調装置９００は、ＳＮＲ推
定部９０１を具備し、音声分離係数計算手段において、
ノイズの減衰量をＳＮＲ推定値の大きさに応じて調整す
る点が、図１の音声強調装置と異なる。The speech emphasizing apparatus 900 of FIG. 10 comprises an SNR estimating section 901, and in the speech separation coefficient calculating means,
The difference from the speech enhancement apparatus of FIG. 1 is that the amount of noise attenuation is adjusted according to the magnitude of the estimated SNR value.

【０１６４】図１０において、ＳＮＲ推定部９０１は入
力音声パワスペクトルおよびノイズベース推定部１０５
から出力されたノイズベースの推定値に基づいて音声レ
ベルと雑音レベルの比を計算し、その結果を音声分離係
数計算部１１４に出力する。音声分離係数計算部１１４
はＳＮＲ推定値の大きさに応じて周波数成分毎のノイズ
の減衰量を計算し、乗算部１１５で入力音声スペクトル
と乗算する。具体的には、以下の式（１４）〜（１６）
を用いてＳＮＲを計算する。In FIG. 10, SNR estimating section 901 is input speech power spectrum and noise base estimating section 105.
The ratio of the voice level to the noise level is calculated based on the noise-based estimated value output from the voice separation coefficient calculation unit 114, and the result is output to the voice separation coefficient calculation unit 114. Speech separation coefficient calculation unit 114
Calculates the amount of noise attenuation for each frequency component according to the magnitude of the SNR estimated value, and multiplies the input speech spectrum by the multiplication unit 115. Specifically, the following equations (14) to (16)
To calculate the SNR.

【０１６５】[0165]

【数１４】 [Equation 14]

【０１６６】[0166]

【数１５】 [Equation 15]

【０１６７】[0167]

【数１６】ここで、Ｓ_p（ｎ）は音声レベルの移動平均値を、Ｎ
_s（ｎ）はノイズの移動平均値を、α_Sは移動平均係数を
示す。[Equation 16] Here, S _p (n) is the moving average value of the voice level, N
_s (n) is a moving average value of noise, and α _S is a moving average coefficient.

【０１６８】ＳＮＲの値に応じて、修正コムフィルタの
通過域と阻止域において、音声分離係数(ノイズの減衰
量)は以下の式（１７）を用いて計算する。According to the SNR value, the voice separation coefficient (amount of noise attenuation) in the pass band and stop band of the modified comb filter is calculated using the following equation (17).

【０１６９】[0169]

【数１７】ここで、γ（ｎ）はノイズベースを減算する量を示す係
数で、ｇｃ（ｎ）はノイズ減衰の度合い係数である。異
なるＳＮ比の環境に対応できるように、γ（ｎ）とｇｃ
（ｎ）の値をＳＮＲの推定値より自動的に調整できるよ
うにする。例えば、γ（ｎ）の大きさはＳＮＲ（ｎ）の
値に正比例し、ｇｃ（ｎ）の大きさはＳＮＲ（ｎ）の値
に反比例するように自動的に調整する。[Equation 17] Here, γ (n) is a coefficient indicating the amount by which the noise base is subtracted, and gc (n) is a noise attenuation degree coefficient. Γ (n) and gc are set so that they can handle environments with different SN ratios.
The value of (n) can be automatically adjusted from the estimated value of SNR. For example, the magnitude of γ (n) is directly proportional to the value of SNR (n), and the magnitude of gc (n) is automatically adjusted to be inversely proportional to the value of SNR (n).

【０１７０】このように、本実施の形態の音声分離装置
によれば、修正コムフィルタの通過域と阻止域におい
て、入力音声パワスペクトルからノイズベースを減算す
る量とノイズ減衰の度合いをＳＮＲ推定値の大きさに応
じて調整することによって、異なるＳＮＲの環境下でも
適切なノイズ減衰を行い、音声歪と残留ノイズの少ない
音声強調を実現できる。As described above, according to the speech separation apparatus of the present embodiment, the amount of subtraction of the noise base from the input speech power spectrum and the degree of noise attenuation in the pass band and stop band of the modified comb filter are used as the SNR estimated value. By adjusting in accordance with the magnitude of, the appropriate noise attenuation can be performed even under the environment of different SNR, and the voice enhancement with less voice distortion and residual noise can be realized.

【０１７１】（実施の形態９）図１１は、実施の形態９
にかかる音声強調装置の構成の例を示すブロック図であ
る。但し、図１及び図１０と共通する構成については図
１及び図１０と同一番号を付し、詳しい説明を省略す
る。(Ninth Embodiment) FIG. 11 shows a ninth embodiment.
It is a block diagram showing an example of composition of a voice emphasis device concerning. However, components common to those in FIGS. 1 and 10 are designated by the same reference numerals as those in FIGS. 1 and 10, and detailed description thereof is omitted.

【０１７２】図１１の音声強調装置１０００は、ノイズ
ベース減算部１００１と、重み係数計算部１００２とを
具備し、音声成分のレベルを入力信号のパワスペクトル
の移動平均値より算出し、雑音成分のレベルをノイズベ
ースの推定値に各周波数成分別に重み計数を乗算した値
より算出し、前記音声成分のレベルと前記雑音成分のレ
ベルの比から信号対雑音比を算出する点が、図１の音声
強調装置と異なる。The speech emphasizing apparatus 1000 of FIG. 11 comprises a noise base subtraction unit 1001 and a weighting coefficient calculation unit 1002, calculates the level of the speech component from the moving average value of the power spectrum of the input signal, and calculates the noise component 1 is that a level is calculated from a value obtained by multiplying a noise-based estimated value by a weighting coefficient for each frequency component, and a signal-to-noise ratio is calculated from a ratio between the level of the voice component and the level of the noise component. Different from the highlighter.

【０１７３】図１１において、ノイズベース推定部１０
５はノイズベースを推定し、その結果をノイズベース減
算部１００１と重み係数計算部１００２に出力する。ノ
イズベース減算部１００１は入力音声パワスペクトルの
移動平均値を計算し、移動平均値からノイズベースの推
定値を減算してＳＮＲ推定部９０１に出力する。重み係
数計算部１００２はノイズベース推定値の各周波数成分
に重み係数を計算し、その結果をＳＮＲ推定部９０１に
出力する。ＳＮＲ推定部９０１は音声レベルと雑音レベ
ルの比を計算し、その結果を音声分離係数計算部１１４
に出力する。具体的には、以下の式（１８）〜式（２
０）を用いてＳＮＲを計算する。In FIG. 11, the noise base estimation unit 10
5 estimates the noise base, and outputs the result to the noise base subtraction unit 1001 and the weight coefficient calculation unit 1002. The noise-based subtraction unit 1001 calculates the moving average value of the input speech power spectrum, subtracts the noise-based estimated value from the moving average value, and outputs it to the SNR estimation unit 901. Weighting coefficient calculation section 1002 calculates a weighting coefficient for each frequency component of the noise-based estimated value, and outputs the result to SNR estimation section 901. The SNR estimation unit 901 calculates the ratio between the voice level and the noise level, and the result is used as the voice separation coefficient calculation unit 114.
Output to. Specifically, the following equations (18) to (2
0) is used to calculate the SNR.

【０１７４】[0174]

【数１８】 [Equation 18]

【０１７５】[0175]

【数１９】 [Formula 19]

【０１７６】[0176]

【数２０】ここで、βはノイズベースを減算する量を示す係数で、
δ（ｋ）は重み係数である。重み係数δ（ｋ）は音声の
特性を利用して設定する。例えば、中間周波数領域にお
ける音声スペクトルのエネルギが小さいが音声の明瞭性
に与える影響が大きいので、中間周波数領域におけるノ
イズに対して重み係数の値を増やして雑音レベルを計算
すれば、異なるノイズに対して適切な減衰を行うことが
できる。[Equation 20] Where β is a coefficient indicating the amount by which the noise base is subtracted,
δ (k) is a weighting coefficient. The weighting factor δ (k) is set by utilizing the characteristics of voice. For example, since the energy of the voice spectrum in the intermediate frequency region is small, but the influence on the intelligibility of the voice is great, if the noise level is calculated by increasing the value of the weighting coefficient for the noise in the intermediate frequency region, different noises To provide appropriate damping.

【０１７７】このように、本実施の形態の音声強調装置
によれば、入力音声パワスペクトルの移動平均値からノ
イズベースを減算して音声レベルの計算することによ
り、ノイズの影響を減少し、低ＳＮＲの環境下でも正確
な音声レベルを計算することができる。また、ノイズベ
ース推定値の各周波数成分に重み係数を乗算して計算す
ることにより、異なるノイズに対して適切な減衰を行
い、音声歪を低減することができる。As described above, according to the voice emphasizing apparatus of the present embodiment, the noise base is calculated by subtracting the noise base from the moving average value of the input voice power spectrum, thereby reducing the influence of noise and reducing the noise level. An accurate voice level can be calculated even in an SNR environment. In addition, by multiplying each frequency component of the noise-based estimated value by a weighting coefficient for calculation, it is possible to appropriately attenuate different noises and reduce voice distortion.

【０１７８】なお、実施の形態９は、実施の形態８と組
み合せることができる。すなわち、図１１の音声強調装
置に実施の形態８に記載のＳＮＲ推定部９０１を用いれ
ば、実施の形態８の効果も得ることができる。Note that the ninth embodiment can be combined with the eighth embodiment. That is, if the SNR estimation unit 901 described in the eighth embodiment is used in the speech enhancement apparatus in FIG. 11, the effect of the eighth embodiment can be obtained.

【０１７９】（実施の形態１０）図１２は、実施の形態
１０にかかる音声強調装置の構成の例を示すブロック図
である。但し、図１、図１０及び図１１と共通する構成
については図１、図１０及び図１１と同一番号を付し、
詳しい説明を省略する。(Tenth Embodiment) FIG. 12 is a block diagram showing an example of the configuration of a voice emphasizing device according to the tenth embodiment. However, the same components as those in FIG. 1, FIG. 10 and FIG.
Detailed explanation is omitted.

【０１８０】図１２の音声強調装置１１００は、ＳＮＲ
の変動を抑えるＳＮＲ変動抑圧部１１０１を具備し、Ｓ
ＮＲ変動抑圧部１１０１は、ＳＮＲ推定値とＳＮＲ推定
値の長期移動平均値の結果に基づいてＳＮＲの変動を抑
圧する点が、図１の音声強調装置と異なる。The speech enhancement apparatus 1100 shown in FIG.
SNR fluctuation suppressing section 1101 for suppressing fluctuation of
The NR variation suppressing unit 1101 is different from the speech enhancement apparatus of FIG. 1 in that the NR variation suppressing unit 1101 suppresses the SNR variation based on the result of the SNR estimated value and the long-term moving average value of the SNR estimated value.

【０１８１】図１２において、ＳＮＲ推定部９０１は音
声レベルと雑音レベルの比を計算し、その結果をＳＮＲ
変動抑圧部１１０１に出力する。ＳＮＲ変動抑圧部１１
０１はＳＮＲ推定値に基づいてＳＮＲの長期移動平均値
を計算し、その結果とＳＮＲ推定値の偏差を計算し、Ｓ
ＮＲ推定値の長期移動平均値と前記偏差の一部を加算し
てＳＮＲ変動抑圧結果として用いる。そして、変動が抑
圧されたＳＮＲ推定値を音声分離係数計算部１１４に出
力する。In FIG. 12, the SNR estimation unit 901 calculates the ratio between the voice level and the noise level, and outputs the result as the SNR.
Output to the fluctuation suppressing unit 1101. SNR fluctuation suppressing section 11
01 calculates the long-term moving average value of SNR based on the SNR estimated value, calculates the deviation between the result and the SNR estimated value, and
The long-term moving average value of the NR estimated value and a part of the deviation are added and used as the SNR fluctuation suppression result. Then, the SNR estimated value in which the fluctuation is suppressed is output to the voice separation coefficient calculation unit 114.

【０１８２】具体的には、以下の式（２１）を用いてＳ
ＮＲの長期移動平均値を計算し、式（２２）を用いてＳ
ＮＲの変動を抑圧するＳＮＲの推定値を計算する。Specifically, S is calculated using the following equation (21).
Calculate the long-term moving average value of NR and use equation (22) to calculate S
An estimated value of SNR that suppresses fluctuations in NR is calculated.

【０１８３】[0183]

【数２１】 [Equation 21]

【０１８４】[0184]

【数２２】ここで、α_rは移動平均係数で、μは加算する偏差の大
きさを決める係数である。[Equation 22] Here, α _r is a moving average coefficient, and μ is a coefficient that determines the magnitude of deviation to be added.

【０１８５】このように、本実施の形態の音声強調装置
によれば、ＳＮＲ推定値とＳＮＲ推定値の長期移動平均
値の偏差を計算し、ＳＮＲ推定値の長期移動平均値と前
記偏差の一部を加算してＳＮＲ推定値として用いること
によって、ＳＮＲの変動を有効に抑制し、安定的にＳＮ
Ｒの大きさに応じてノイズ減衰のレベル調整を行うこと
ができる。As described above, according to the speech emphasizing apparatus of the present embodiment, the deviation between the SNR estimated value and the long-term moving average value of the SNR estimated value is calculated, and the long-term moving average value of the SNR estimated value and one of the deviations are calculated. By adding the parts and using them as the SNR estimation value, the SNR variation is effectively suppressed, and the SN is stable.
The level of noise attenuation can be adjusted according to the magnitude of R.

【０１８６】なお、実施の形態１０は、実施の形態８あ
るいは実施の形態９と組み合せることができる。すなわ
ち、図１２の音声強調装置に実施の形態８に記載のＳＮ
Ｒ推定部９０１を用いれば、実施の形態８の効果も得る
ことができ、図１２の音声強調装置に実施の形態９に記
載のノイズベース減算と重み係数計算手段を用いてＳＮ
Ｒの推定を行えば、実施の形態９の効果も得ることがで
きる。Note that the tenth embodiment can be combined with the eighth or ninth embodiment. That is, the SN described in the eighth embodiment in the voice emphasizing device of FIG.
If the R estimation unit 901 is used, the effect of the eighth embodiment can also be obtained, and SN can be obtained by using the noise-based subtraction and weighting factor calculation means described in the ninth embodiment in the speech enhancement apparatus of FIG.
If R is estimated, the effect of the ninth embodiment can be obtained.

【０１８７】（実施の形態１１）図１３は、実施の形態
１１にかかる音声強調装置の構成の例を示すブロック図
である。但し、図１と共通する構成については図１と同
一番号を付し、詳しい説明を省略する。(Embodiment 11) FIG. 13 is a block diagram showing an example of the configuration of a speech emphasizing device according to Embodiment 11. However, the same components as those in FIG. 1 are denoted by the same reference numerals as those in FIG. 1, and detailed description thereof will be omitted.

【０１８８】図１３の音声強調装置１２００は、更新速
度の速いノイズベース更新部１２０１を具備し、音声区
間においてもノイズベースの追跡を行うことができる点
が、図１の音声強調装置と異なる。The speech emphasizing apparatus 1200 of FIG. 13 is different from the speech emphasizing apparatus of FIG. 1 in that it is equipped with a noise base updating unit 1201 having a high updating speed, and noise-based tracking can be performed even in a voice section.

【０１８９】図１３において、ノイズベース更新部１２
０１は、入力音声パワスペクトルに基づいて更新速度の
速い移動平均係数を用いてノイズベースを推定し、その
結果をノイズベース推定部１０５に出力する。ノイズベ
ース推定部１０５は、更新速度の遅い移動平均係数を用
いてノイズベースを推定し、その結果を第一音声/非音
声識別部１０６および第二音声/非音声識別部１０７に
出力する。In FIG. 13, the noise base updating unit 12
01 estimates a noise base based on the input speech power spectrum using a moving average coefficient with a high update speed, and outputs the result to the noise base estimation unit 105. The noise base estimation unit 105 estimates the noise base using the moving average coefficient having a slow update rate, and outputs the result to the first voice / non-voice identification unit 106 and the second voice / non-voice identification unit 107.

【０１９０】具体的には、以下の式（２３）と式（２
４）を用いて更新速度の速いノイズベースと更新速度の
遅いノイズベースを推定する。Specifically, the following equations (23) and (2)
4) is used to estimate a noise base with a fast update speed and a noise base with a slow update speed.

【０１９１】[0191]

【数２３】 [Equation 23]

【０１９２】[0192]

【数２４】ここで、α_fとα_sはそれぞれ速い更新係数と遅い更新係
数であり、Θ_fastは音声と雑音を識別する閾値である。[Equation 24] Here, α _f and α _s are a fast update coefficient and a slow update coefficient, respectively, and Θ _fast is a threshold value for distinguishing between speech and noise.

【０１９３】このように、本実施の形態の音声強調装置
によれば、更新速度の速い移動平均係数を用いてノイズ
ベースの推定を行うことによって、音声区間においても
雑音レベルの急激な変動を追跡することができる。ま
た、更新速度の遅いノイズベースの更新は、更新速度の
速いノイズベースに基づいて行うことによって、正確に
ノイズベースの推定を行うことができ、雑音レベルの急
激な変動によるノイズベース更新の停止を防止すること
ができる。As described above, according to the speech emphasizing apparatus of the present embodiment, noise-based estimation is performed using the moving average coefficient with a high update rate, so that a rapid change in the noise level is tracked even in the speech section. can do. In addition, the noise-based update with a slow update speed can be accurately estimated by performing the noise-based update with a fast update speed, and the noise-based update can be stopped due to a sudden change in the noise level. Can be prevented.

【０１９４】（実施の形態１２）図１４は、本発明の実
施の形態１２に係る雑音分離装置の構成を示すブロック
図である。本実施の形態の雑音分離装置１３００は、雑
音を含む音声信号から雑音信号を分離、抽出するもので
ある。(Embodiment 12) FIG. 14 is a block diagram showing the structure of a noise separating apparatus according to Embodiment 12 of the present invention. The noise separation device 1300 of the present embodiment separates and extracts a noise signal from a speech signal containing noise.

【０１９５】図１４において、雑音分離装置１３００
は、時間分割部１０１と、窓掛け部１０２と、ＦＦＴ部
１０３と、周波数分割部１０４と、ノイズベース推定部
１０５と、音声/非音声識別部１３０１と、雑音コムフ
ィルタ生成部１３０２と、実数虚数分離部１３０３と、
雑音分離係数計算部１３０４と、乗算部１３０５と、雑
音周波数合成部１３０６と、ＩＦＦＴ部１３０７と、か
ら主に構成される。In FIG. 14, the noise separation device 1300
Is a time division unit 101, a windowing unit 102, an FFT unit 103, a frequency division unit 104, a noise base estimation unit 105, a voice / non-voice identification unit 1301, a noise comb filter generation unit 1302, and a real number. An imaginary number separating unit 1303,
The noise separation coefficient calculation unit 1304, the multiplication unit 1305, the noise frequency synthesis unit 1306, and the IFFT unit 1307 are mainly included.

【０１９６】ここで、但し、図１と共通する構成につい
ては図１と同一番号を付し、詳しい説明を省略する。Here, the same components as those in FIG. 1 are designated by the same reference numerals as those in FIG. 1, and detailed description thereof will be omitted.

【０１９７】音声/非音声識別部１３０１は、周波数分
割部１０４から出力された音声スペクトル信号とノイズ
ベース推定部１０５から出力されたノイズベースの推定
値の差が所定の閾値以上である場合、音声成分を含む有
音部分と判定し、それ以外の場合、音声成分を含まない
雑音のみの無音部分であると判定し、その結果を雑音コ
ムフィルタ生成部１３０２に出力する。雑音コムフィル
タ生成部１３０２は、音声/非音声識別部１３０１の結
果に基づいて雑音分離コムフィルタを生成して、このコ
ムフィルタを実数虚数分離部１３０３に出力する。The speech / non-speech discriminating section 1301 discriminates the speech if the difference between the speech spectrum signal outputted from the frequency dividing section 104 and the noise base estimation value outputted from the noise base estimating section 105 is equal to or larger than a predetermined threshold value. It is determined to be a voiced portion including a component, and in other cases, it is determined to be a silent portion including only noise that does not include a voice component, and the result is output to the noise comb filter generation unit 1302. The noise comb filter generation unit 1302 generates a noise separation comb filter based on the result of the speech / non-speech discrimination unit 1301 and outputs this comb filter to the real number imaginary number separation unit 1303.

【０１９８】具体的には、雑音コムフィルタ生成部１３
０２は、音声情報を抑制するように音声/非音声識別の
閾値を低く設定して以下の式（２５）を用いて雑音分離
コムフィルタを生成する。Specifically, the noise comb filter generator 13
02 sets a low threshold value for voice / non-voice discrimination so as to suppress voice information, and generates a noise separation comb filter using the following equation (25).

【０１９９】[0199]

【数２５】ここでΘ_nosは雑音分離用閾値である。[Equation 25] Where Θ _nos is a noise separation threshold.

【０２００】実数虚数分離部１３０３は、入力音声スペ
クトルの実数部と虚数部を分離し、その結果を雑音分離
係数計算部１３０４に出力する。雑音分離係数計算部１
３０４は、雑音分離コムフィルタの通過域と阻止域に対
して、周波数成分毎に別々の分離係数を計算し、その結
果を乗算部１３０５に出力する。The real imaginary number separation unit 1303 separates the real number part and the imaginary number part of the input speech spectrum and outputs the result to the noise separation coefficient calculation unit 1304. Noise separation coefficient calculation unit 1
304 calculates different separation coefficients for each frequency component with respect to the pass band and the stop band of the noise separation comb filter, and outputs the result to the multiplication unit 1305.

【０２０１】具体的には、以下の式（２６）と式（２
７）を用いて、雑音分離コムフィルタの阻止域におい
て、雑音分離係数は１とし、雑音分離コムフィルタの通
過域において、雑音分離係数は入力音声スペクトルの実
数部と虚数部に対して、別々の乱数とノイズベースの推
定値を乗算する。Specifically, the following equations (26) and (2)
7) is used, the noise separation coefficient is set to 1 in the stop band of the noise separation comb filter, and the noise separation coefficient is set separately for the real part and the imaginary part of the input speech spectrum in the pass band of the noise separation comb filter. Multiply a random number by a noise-based estimate.

【０２０２】[0202]

【数２６】 [Equation 26]

【０２０３】[0203]

【数２７】ここでｒｄ_re（ｉ）は、実数部に用いられるランダム関
数で均一分布の乱数により構成され、ｒｄ_im（ｉ）は、
虚数部に用いられるランダム関数で均一分布の乱数によ
り構成される。[Equation 27] Here, rd _re (i) is a random function used for the real part and is composed of uniformly distributed random numbers, and rd _im (i) is
It is a random function used for the imaginary part and is composed of uniformly distributed random numbers.

【０２０４】乗算部１３０５は、周波数分割部１０４か
ら出力された音声スペクトルに雑音分離係数計算部１３
０４から出力された分離係数を周波数成分毎に乗算す
る。そして、乗算の結果得られた雑音スペクトルを雑音
周波数合成部１３０６に出力する。The multiplying unit 1305 calculates the noise separation coefficient calculating unit 13 based on the speech spectrum output from the frequency dividing unit 104.
The separation coefficient output from 04 is multiplied for each frequency component. Then, the noise spectrum obtained as a result of the multiplication is output to the noise frequency synthesis unit 1306.

【０２０５】雑音周波数合成部１３０６は、乗算部１３
０５から出力された周波数成分のスペクトルを所定の処
理時間単位で、周波数領域で連続する雑音スペクトルに
合成してＩＦＦＴ部１３０７に出力する。The noise frequency synthesizer 1306 has a multiplier 13
The frequency component spectrum output from 05 is combined into a continuous noise spectrum in the frequency domain in a predetermined processing time unit and output to the IFFT unit 1307.

【０２０６】ＩＦＦＴ部１３０７は、雑音周波数合成部
１３０６から出力された雑音スペクトルにＩＦＦＴ（In
verse Fast Fourier Transform）を行って雑音信号
に変換した信号を出力する。The IFFT unit 1307 adds IFFT (In) to the noise spectrum output from the noise frequency synthesis unit 1306.
verse Fast Fourier Transform) is performed to output a signal converted into a noise signal.

【０２０７】このように、本実施の形態の雑音分離装置
によれば、雑音専用コムフィルタを生成することによ
り、雑音の特性を最大限に抽出することができる。ま
た、雑音分離コムフィルタの阻止域において、雑音成分
を減衰せず、雑音分離コムフィルタの通過域において、
入力音声スペクトルの実数部と虚数部に対して、別々の
乱数とノイズベースの推定値を乗算することによって、
雑音成分の実数部と虚数部の振幅と位相はすべてランダ
ム化され、良好な雑音分離特性を得ることができる。As described above, according to the noise separating apparatus of the present embodiment, the noise characteristic can be extracted to the maximum extent by generating the noise dedicated comb filter. In the stopband of the noise separation comb filter, the noise component is not attenuated, and in the passband of the noise separation comb filter,
By multiplying the real and imaginary parts of the input speech spectrum by separate random numbers and noise-based estimates,
The amplitude and phase of the real and imaginary parts of the noise component are all randomized, and good noise separation characteristics can be obtained.

【０２０８】（実施の形態１３）図１５は、実施の形態
１３に係る雑音分離装置の構成の例を示すブロック図で
ある。但し、図１及び図１４と共通する構成については
図１及び図１４と同一番号を付し、詳しい説明を省略す
る。(Thirteenth Embodiment) FIG. 15 is a block diagram showing an example of the configuration of a noise separation device according to the thirteenth embodiment. However, the same components as those in FIGS. 1 and 14 are denoted by the same reference numerals as those in FIGS. 1 and 14, and detailed description thereof is omitted.

【０２０９】図１５の雑音分離装置は、雑音成分保存部
１４０１を具備し、雑音分離用コムフィルタの阻止域に
おける入力音声のスペクトル成分をメモリに保存し、そ
の値を雑音分離用コムフィルタの通過域に用いる点が、
図１４と異なる。The noise separation device of FIG. 15 comprises a noise component storage unit 1401, stores the spectral component of the input voice in the stop band of the noise separation comb filter in the memory, and passes the value through the noise separation comb filter. The points used for the area are
Different from FIG.

【０２１０】図１５において、雑音成分保存部１４０１
は、雑音コムフィルタ生成部１３０２から出力された雑
音コムフィルタの阻止域において入力音声スペクトルを
保存し、雑音コムフィルタの通過域において雑音成分保
存結果を雑音分離係数計算部１３０４に出力する。In FIG. 15, noise component storage unit 1401
Stores the input speech spectrum in the stop band of the noise comb filter output from the noise comb filter generation unit 1302, and outputs the noise component storage result to the noise separation coefficient calculation unit 1304 in the pass band of the noise comb filter.

【０２１１】具体的には、例えば所定数のメモリを用意
して雑音コムフィルタの阻止域における入力音声スペク
トルを低周波数領域から高周波数領域まで順次に保存
し、雑音コムフィルタの通過域において同じ低周波数領
域から高周波数領域までの順で、最新の保存されたデー
タかつもっとも近い周波数成分を有するものを選択して
雑音コムフィルタの通過域における入力音声スペクトル
として用いる。Specifically, for example, a predetermined number of memories are prepared to sequentially store the input speech spectrum in the stop band of the noise comb filter from the low frequency region to the high frequency region, and the same low frequency band is used in the pass band of the noise comb filter. In order from the frequency domain to the high frequency domain, the latest stored data and the one having the closest frequency component are selected and used as the input speech spectrum in the pass band of the noise comb filter.

【０２１２】このように、本実施の形態の雑音分離装置
によれば、雑音分離用コムフィルタの阻止域における入
力音声のスペクトル成分をメモリに保存し、その値を雑
音分離用コムフィルタの通過域に用いることにより、実
際の雑音と特性の近い擬似雑音を再構成することがで
き、良好な雑音分離特性を得ることができる。As described above, according to the noise separating apparatus of the present embodiment, the spectral component of the input voice in the stop band of the noise separating comb filter is stored in the memory, and the value is stored in the pass band of the noise separating comb filter. , It is possible to reconstruct pseudo noise having characteristics close to those of actual noise, and obtain good noise separation characteristics.

【０２１３】なお、実施の形態１３は、実施の形態１２
と組み合せることができる。すなわち、図１５の雑音分
離装置に実施の形態１２に雑音分離係数計算部１３０４
を用いれば、実施の形態１２の効果も得ることができ
る。The thirteenth embodiment is the twelfth embodiment.
Can be combined with. That is, in the noise separation device of FIG. 15, the noise separation coefficient calculation unit 1304 according to the twelfth embodiment is used.
By using, the effect of the twelfth embodiment can be obtained.

【０２１４】なお、本発明は上記実施の形態に限定され
ず、複数の実施の形態を組み合わせる、または種々変更
して実施することが可能である。例えば、上記実施の形
態では、音声強調装置または雑音抑圧装置として行う場
合について説明しているが、これに限られるものではな
く、この音声強調方法または雑音抑圧方法をソフトウェ
アとして行うことも可能である。The present invention is not limited to the above-described embodiments, and a plurality of embodiments can be combined or variously modified and carried out. For example, although cases have been described with the above embodiments where a speech enhancement apparatus or noise suppression apparatus is used, the present invention is not limited to this, and this speech enhancement method or noise suppression method can also be implemented as software. .

【０２１５】例えば、上記音声強調方法または雑音抑圧
方法を実行するプログラムを予めＲＯＭ（Read Only Me
mory）に格納しておき、そのプログラムをＣＰＵ（Cent
ralProcessor Unit）によって動作させるようにしても
良い。For example, a program for executing the above speech enhancement method or noise suppression method is stored in advance in a ROM (Read Only Me).
mory) and store the program in the CPU (Cent
ralProcessor Unit).

【０２１６】また、上記音声強調方法または雑音抑圧方
法を実行するプログラムをコンピュータで読み取り可能
な記憶媒体に格納し、記憶媒体に格納されたプログラム
をコンピュータのＲＡＭ（Random Access memory）に記
録して、コンピュータをそのプログラムにしたがって動
作させるようにしても良い。Further, a program for executing the above speech enhancement method or noise suppression method is stored in a computer-readable storage medium, and the program stored in the storage medium is recorded in a RAM (Random Access Memory) of the computer, The computer may be operated according to the program.

【０２１７】また、上記音声強調または雑音抑圧を行う
プログラムをサーバに格納し、サーバに格納されたプロ
グラムをクライアントに転送して、クライアント上でそ
のプログラムを実行させてもよい。このような場合にお
いても、上記実施の形態と同様の作用及び効果を呈す
る。It is also possible to store a program for performing the above speech enhancement or noise suppression in a server, transfer the program stored in the server to a client, and execute the program on the client. Even in such a case, the same operation and effect as those of the above-described embodiment are exhibited.

【０２１８】また、上記いずれかの実施の形態に係る音
声強調装置または雑音抑圧装置は、無線通信装置、通信
端末、基地局装置等に搭載することもできる。この結
果、通信時の音声を音声強調または雑音抽出できる。[0218] Further, the voice emphasizing device or the noise suppressing device according to any one of the above-mentioned embodiments can be installed in a radio communication device, a communication terminal, a base station device, or the like. As a result, the voice during communication can be emphasized or noise can be extracted.

【０２１９】[0219]

【発明の効果】以上説明したように、本発明の音声強調
装置及び音声強調方法によれば、音声信号の周波数分割
スペクトルに基づいて音声抑圧に用いるコムフィルタよ
り多くの雑音のピークを取り除いたコムフィルタを生成
し、このコムフィルタを用いて音声信号のピッチ情報を
取得し、コムフィルタの音声ピッチを補うことにより、
音声の歪みが少なくかつ雑音を十分に除去することがで
きる。As described above, according to the voice emphasizing device and the voice emphasizing method of the present invention, a comb in which more noise peaks are removed than the comb filter used for the voice suppression based on the frequency-divided spectrum of the voice signal. By generating a filter, using this comb filter to obtain pitch information of the voice signal, and supplementing the voice pitch of the comb filter,
The distortion of the voice is small and the noise can be sufficiently removed.

[Brief description of drawings]

【図１】本発明の実施の形態１に係る音声強調装置の構
成を示すブロック図FIG. 1 is a block diagram showing a configuration of a voice emphasizing device according to a first embodiment of the present invention.

【図２】上記実施の形態にかかる音声強調装置で作成さ
れるコムフィルタの例を示す図FIG. 2 is a diagram showing an example of a comb filter created by the voice enhancement device according to the above embodiment.

【図３】上記実施の形態にかかる音声処理装置における
コムフィルタの修復の例を示す図FIG. 3 is a diagram showing an example of restoration of a comb filter in the voice processing device according to the embodiment.

【図４】本発明の実施の形態２に係る音声強調装置の構
成を示すブロック図FIG. 4 is a block diagram showing a configuration of a voice emphasizing device according to a second embodiment of the present invention.

【図５】本発明の実施の形態３に係る音声強調装置の構
成を示すブロック図FIG. 5 is a block diagram showing a configuration of a voice emphasizing device according to a third embodiment of the present invention.

【図６】本発明の実施の形態４に係る音声強調装置の構
成を示すブロック図FIG. 6 is a block diagram showing a configuration of a voice emphasizing device according to a fourth embodiment of the present invention.

【図７】本発明の実施の形態５に係る音声強調装置の構
成を示すブロック図FIG. 7 is a block diagram showing a configuration of a voice emphasizing device according to a fifth embodiment of the present invention.

【図８】本発明の実施の形態６に係る音声強調装置の構
成を示すブロック図FIG. 8 is a block diagram showing a configuration of a voice emphasizing device according to a sixth embodiment of the present invention.

【図９】本発明の実施の形態７に係る音声強調装置の構
成を示すブロック図FIG. 9 is a block diagram showing a configuration of a voice emphasizing device according to a seventh embodiment of the present invention.

【図１０】本発明の実施の形態８に係る音声強調装置の
構成を示すブロック図FIG. 10 is a block diagram showing a configuration of a voice emphasizing device according to an eighth embodiment of the present invention.

【図１１】本発明の実施の形態９に係る音声強調装置の
構成を示すブロック図FIG. 11 is a block diagram showing a configuration of a voice emphasizing device according to a ninth embodiment of the present invention.

【図１２】本発明の実施の形態１０に係る音声強調装置
の構成を示すブロック図FIG. 12 is a block diagram showing a configuration of a voice emphasizing device according to a tenth embodiment of the present invention.

【図１３】本発明の実施の形態１１に係る音声強調装置
の構成を示すブロック図FIG. 13 is a block diagram showing a configuration of a voice emphasizing device according to an eleventh embodiment of the present invention.

【図１４】本発明の実施の形態１２に係る雑音分離装置
の構成を示すブロック図FIG. 14 is a block diagram showing a configuration of a noise separation device according to a twelfth embodiment of the present invention.

【図１５】本発明の実施の形態１３に係る雑音分離装置
の構成を示すブロック図FIG. 15 is a block diagram showing a configuration of a noise separation device according to a thirteenth embodiment of the present invention.

[Explanation of symbols]

１０３ＦＦＴ部１０４周波数分割部１０５ノイズベース推定部１０６第一音声/非音声識別部１０７第二音声/非音声識別部１０８第一コムフィルタ生成部１０９第二コムフィルタ生成部１１０有音/無声判別部１１１ピッチ推定部１１２ピッチ調波構造修復部１１３コムフィルタ修正部１１４音声分離係数計算部１１５、１３０５乗算部１１６音声周波数合成部１１７ＩＦＦＴ部３０１音声／雑音フレーム検出部４０１局部最小値計算部５０１、１００１ノイズベース減算部６０１直流成分生成部７０１第３音声/非音声識別部７０２第三コムフィルタ生成部７０３ノイズ特性推定部８０１周波数領域選択部９０１ＳＮＲ推定部１００２重み係数計算部１１０１ＳＮＲ変動抑圧部１２０１ノイズベース更新部１３０１音声/非音声識別部１３０２雑音コムフィルタ生成部１３０３実数虚数分離部１３０４雑音分離係数計算部１３０６雑音周波数合成部１４０１雑音成分保存部 103 FFT section 104 Frequency division unit 105 noise base estimation unit 106 First voice / non-voice discrimination unit 107 Second voice / non-voice discrimination section 108 First Comb Filter Generation Unit 109 Second comb filter generator 110 voiced / unvoiced discriminator 111 Pitch estimation unit 112 Pitch harmonic structure restoration section 113 Comfilter correction unit 114 Speech Separation Coefficient Calculation Unit 115, 1305 multiplication unit 116 Speech frequency synthesizer 117 IFFT section 301 voice / noise frame detector 401 Local minimum value calculation unit 501, 1001 Noise base subtraction unit 601 DC component generator 701 Third voice / non-voice discrimination unit 702 Third comb filter generation unit 703 Noise characteristic estimation unit 801 Frequency domain selector 901 SNR estimation unit 1002 Weighting factor calculation unit 1101 SNR fluctuation suppression unit 1201 Noise base update unit 1301 voice / non-voice discriminator 1302 Noise comb filter generation unit 1303 Real number imaginary number separation part 1304 Noise Separation Coefficient Calculation Unit 1306 Noise frequency synthesizer 1401 Noise component storage unit

Claims

[Claims]

1. A frequency division means for outputting a frequency division spectrum obtained by dividing the spectrum of an input signal by a predetermined frequency unit, and a first comb filter for attenuating a signal in a frequency domain determined to be silent based on the frequency division spectrum. A first comb filter creating means for creating, a second comb filter creating means for creating a second comb filter which is a filter in which more noise peaks are removed than the first comb filter based on the frequency division spectrum; A comb filter correcting unit that corrects a voice pitch harmonic structure included in the first comb filter with a voice pitch estimated from a comb filter and the frequency-divided spectrum, and a first comb filter corrected by the comb filter correcting unit. Suppressing means for suppressing the noise of the frequency division spectrum using the And a voice frequency synthesizing unit for synthesizing the frequency-divided spectrum into a continuous spectrum signal in the frequency domain.

2. A voice / noise frame detection means for determining whether or not a voice component is included in a voice spectrum from the first comb filter and the second comb filter, wherein the comb filter correction means comprises the voice / noise. The speech enhancement apparatus according to claim 1, wherein, when the result of the frame detection means does not include a speech component, the first comb filter is modified to attenuate the signal at each frequency component.

3. The speech / noise frame detection means sets a ratio of a sum of a power spectrum of an input signal in a pass band of the first comb filter and a sum of power spectra of an input signal in a stop band of the first comb filter to a first ratio. As a result, the ratio of the sum of the power spectrum of the input signal in the pass band of the second comb filter and the sum of the power spectrum of the input signal in the stop band of the second comb filter is set as the second result. When the addition result is larger than a predetermined threshold value, the addition result is used, and when the addition result is less than or equal to the predetermined threshold value, it is determined whether or not the voice spectrum includes a voice using the second result. The voice enhancement device according to claim 2.

4. The first comb filter generation means creates a first comb filter having a stop band within a predetermined range from the minimum value of the power spectrum of the input signal. The voice enhancement device according to any one of claims.

5. A pitch estimating means for subtracting a noise base from a power spectrum of an input signal and estimating a voice pitch by using an autocorrelation function of the subtraction result, wherein a comb filter correcting means estimates the speech pitch in the pitch estimating means. The speech enhancement apparatus according to any one of claims 1 to 4, wherein the speech pitch harmonic structure included in the first comb filter is modified with the defined speech pitch.

6. A direct current component generating means for generating a pseudo peak having a predetermined power in a direct current component as a result of subtracting a noise base from a power spectrum of an input signal, wherein the pitch estimating means comprises the direct current component generating means. The speech enhancement apparatus according to claim 5, wherein the speech pitch is estimated from the power spectrum in which the pseudo peak is generated.

7. A second comb filter is provided, comprising noise characteristic estimating means for calculating a moving average of the number of frequency regions in which the power is equal to or more than a predetermined threshold, as a result of subtracting the noise base from the power spectrum of the input signal. 7. The voice emphasizing device according to claim 1, wherein the means creates a second comb filter based on a result of determining whether or not the input signal includes a voice from the moving average.

8. A second comb filter is provided, comprising noise characteristic estimating means for calculating a moving average of the number of frequency regions in which the power is equal to or more than a predetermined threshold, as a result of subtracting the noise base from the power spectrum of the input signal. The means for creating a second comb filter having a predetermined frequency range as a stop band when the moving average calculated by the noise characteristic estimating means is equal to or smaller than a predetermined value. The voice enhancement device according to any one of 1.

9. An SNR estimating means for calculating a signal-to-noise ratio from a power spectrum of an input signal and a noise base, wherein the suppressing means determines a noise suppression amount of the frequency division spectrum from the signal-to-noise ratio. The voice enhancement device according to any one of claims 1 to 8.

10. The SNR estimating means calculates a level of a voice component from a moving average value of a power spectrum of an input signal,
A noise component level is calculated from a value obtained by multiplying a noise-based estimated value by a weighting coefficient for each frequency component, and a signal-to-noise ratio is calculated from a ratio between the voice component level and the noise component level. The voice enhancement device according to claim 9.

11. A fluctuation suppressing means for calculating a deviation between a signal-to-noise ratio and a moving average value of the signal-to-noise ratio, and updating the moving average value of the signal-to-noise ratio using the deviation, The speech enhancement apparatus according to claim 9 or 10, wherein the suppression unit determines a noise suppression amount of the frequency division spectrum from the moving average value of the signal-to-noise ratio updated by the fluctuation suppression unit. .

12. A condition for updating a first moving average value with a second moving average value having a faster updating speed than the first moving average value, by calculating two noise-based moving average values having different updating speeds in a predetermined time unit. The speech enhancement apparatus according to any one of claims 1 to 11, further comprising a noise base updating unit configured to change the first moving average value and output the first moving average value as a noise base estimation value.

13. A wireless communication device comprising the noise suppressing device according to claim 1. Description:

14. A frequency division means for outputting a frequency division spectrum obtained by dividing the spectrum of an input signal in a predetermined frequency unit, and a noise separation comb having a passband for a signal in a frequency domain determined to be silent based on the frequency division spectrum. A noise separation comb filter creation means for creating a filter, a suppression means for separating a noise component of the frequency division spectrum using the noise separation comb filter, and a spectrum in which the frequency division spectrum obtained by separating the noise component is continuous in the frequency domain. And a voice frequency synthesizing means for synthesizing into a signal.

15. The noise separating means multiplies the real part and the imaginary part of the input speech spectrum by different random numbers and a noise-based estimation value in the pass band of the noise separating comb filter. 14. The noise suppression device according to 14.

16. A noise component storage means for storing a spectral component of an input voice in a stop band of a voice separation comb filter, wherein the noise separation means stores the spectral component stored in the memory in a pass band of the noise separation comb filter. 16. The noise suppression device according to claim 14 or 15, which is used for.

17. A wireless communication device comprising the noise suppressing device according to claim 14.

18. A sound source comprising the speech enhancement apparatus according to any one of claims 1 to 12 and the noise suppression apparatus according to any one of claims 14 to 16. Separation device.

19. A frequency division process for outputting a frequency division spectrum obtained by dividing the spectrum of an input signal in a predetermined frequency unit, and a first comb filter for attenuating a signal in a frequency domain determined to be silent based on the frequency division spectrum. A first comb filter creation step to create, a second comb filter creation step to create a second comb filter which is a filter in which more noise peaks are removed than the first comb filter based on the frequency division spectrum, and the second A comb filter correction step for correcting a voice pitch harmonic structure included in the first comb filter with a voice pitch estimated from a comb filter and the frequency division spectrum, and a first comb filter corrected in the comb filter correction step. The suppression process of suppressing the noise of the frequency division spectrum by using A voice frequency synthesizing step of synthesizing the compressed frequency-divided spectrum into a continuous spectrum signal in the frequency domain.

20. A frequency division step of outputting a frequency division spectrum obtained by dividing the spectrum of an input signal in a predetermined frequency unit, and a noise separation comb having a passband of a signal in a frequency domain determined to be silent based on the frequency division spectrum. A noise separation comb filter creation process for creating a filter, a suppression process for separating a noise component of the frequency division spectrum using the noise separation comb filter, and a spectrum in which the frequency division spectrum obtained by separating the noise component is continuous in the frequency domain. And a voice frequency synthesizing step of synthesizing into a signal.

21. A frequency division step of outputting a frequency division spectrum obtained by dividing the spectrum of an input signal in a predetermined frequency unit, and a first comb filter for attenuating a signal in a frequency domain determined to be silent based on the frequency division spectrum. A first comb filter creating step, a second comb filter creating step that creates a second comb filter that is a filter with more noise peaks removed than the first comb filter based on the frequency division spectrum, and the second A comb filter correcting step of correcting a voice pitch harmonic structure included in the first comb filter with a voice pitch estimated from a comb filter and the frequency division spectrum; and a first comb filter corrected by the comb filter correcting means. Suppressor for suppressing the noise of the frequency division spectrum using A speech enhancement program characterized by causing a computer to execute a step and a speech frequency synthesizing step of synthesizing the frequency-divided spectrum in which noise is suppressed into a continuous spectrum signal in the frequency domain.

22. A frequency division step of outputting a frequency division spectrum obtained by dividing the spectrum of an input signal in a predetermined frequency unit, and a noise separation comb having a passband of a signal in a frequency domain determined to be silent based on the frequency division spectrum. A noise separation comb filter creation step of creating a filter, a suppression step of separating a noise component of the frequency division spectrum using the noise separation comb filter, and a spectrum in which the frequency division spectrum obtained by separating the noise component is continuous in the frequency domain. A noise separation program characterized by causing a computer to execute a voice frequency synthesizing step of synthesizing into a signal.

23. A frequency division step of outputting a frequency division spectrum obtained by dividing the spectrum of an input signal by a predetermined frequency unit, and a first comb filter for attenuating a signal in a frequency domain determined to be silent based on the frequency division spectrum. A first comb filter creating step, a second comb filter creating step that creates a second comb filter that is a filter with more noise peaks removed than the first comb filter based on the frequency division spectrum, and the second A comb filter correction step of correcting a voice pitch harmonic structure included in the first comb filter with a voice pitch estimated from a comb filter and the frequency division spectrum, and a first comb filter corrected in the comb filter correction step. Suppressing to suppress the noise of the frequency division spectrum using Storing a voice emphasizing program characterized by causing a computer to execute a pressure step and a voice frequency synthesizing step of synthesizing the frequency-divided spectrum in which noise is suppressed into a continuous spectrum signal in the frequency domain, A server device which outputs the voice enhancement program.

24. A frequency division step of outputting a frequency division spectrum obtained by dividing the spectrum of an input signal in a predetermined frequency unit, and a noise separation comb having a passband as a frequency domain signal determined to be silent based on the frequency division spectrum. A noise separation comb filter creation step of creating a filter, a suppression step of separating a noise component of the frequency division spectrum using the noise separation comb filter, and a spectrum in which the frequency division spectrum obtained by separating the noise component is continuous in the frequency domain. A server device, which stores a noise separation program characterized by causing a computer to execute a voice frequency synthesis step of combining with a signal, and outputs the noise separation program in response to a request.