JP5443547B2

JP5443547B2 - Signal processing device

Info

Publication number: JP5443547B2
Application number: JP2012144135A
Authority: JP
Inventors: 隆須藤; 将高長田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2012-06-27
Filing date: 2012-06-27
Publication date: 2014-03-19
Anticipated expiration: 2029-03-24
Also published as: JP2012181561A

Description

この発明は、音声や音楽・オーディオなどの信号に対して明瞭度を向上させる信号処理装置に関する。 The present invention relates to a signal processing apparatus that improves intelligibility with respect to signals such as voice, music, and audio.

音声や音楽・オーディオなどの信号を再生するときに、音声や音楽・オーディオなどの所望の信号（以降、目的信号と称する）以外の周囲雑音などの影響を受けて目的信号の明瞭度が低下する場合がある。そこで、目的信号の明瞭度を向上させるために、集音した信号に含まれる周囲雑音に応じた信号処理を施す必要がある。従来、このような信号処理方法としては、周囲雑音の音量を用いる手法、周囲雑音の周波数特性を用いる手法（例えば、特許文献１）があった。 When a signal such as voice, music, or audio is reproduced, the clarity of the target signal decreases due to the influence of ambient noise other than a desired signal such as voice, music, or audio (hereinafter referred to as a target signal). There is a case. Therefore, in order to improve the clarity of the target signal, it is necessary to perform signal processing according to the ambient noise included in the collected signal. Conventionally, as such a signal processing method, there are a method using the volume of ambient noise and a method using the frequency characteristics of ambient noise (for example, Patent Document 1).

特開２００１−１８８５９９号公報JP 2001-188599 A

しかしながら、目的信号と周囲雑音とで、制限される周波数帯域が異なっているために信号成分が存在する周波数帯域が異なっていたり、サンプリング周波数が異なっていたりする場合がある。このような場合、従来の信号処理装置では、周囲雑音の音量や周波数特性が高精度に求まらないために音質劣化を招き、明瞭度を向上させることができないという課題があった。 However, there are cases where the frequency band in which the signal component exists is different or the sampling frequency is different because the frequency band to be limited is different between the target signal and the ambient noise. In such a case, the conventional signal processing apparatus has a problem that the volume and frequency characteristics of the ambient noise cannot be obtained with high accuracy, so that the sound quality is deteriorated and the intelligibility cannot be improved.

また、音声信号や音楽・オーディオ信号などの目的信号に対して、エイリアシングを用いたり非線形関数を用いたり線形予測分析を用いたりするような帯域を拡張する従来技術をそのまま用いて、集音した周囲雑音の帯域を拡張しても、周囲雑音の周波数特性を高精度に推定することはできないという課題があった。 In addition, using the conventional technology that extends the band, such as using aliasing, using a nonlinear function, or using linear prediction analysis, for the target signal such as a voice signal, music / audio signal, etc. There is a problem that even if the noise band is expanded, the frequency characteristics of the ambient noise cannot be estimated with high accuracy.

この発明は上記の問題を解決すべくなされたもので、再生する目的信号と周囲雑音とで、制限される周波数帯域が異なっているために信号成分が存在する周波数帯域が異なっていたり、サンプリング周波数が異なっていたりする場合でも、明瞭度を向上させることが可能な信号処理装置を提供することを目的とする。 The present invention has been made to solve the above-mentioned problem. The frequency band in which the signal component exists is different because the frequency band to be reproduced differs between the target signal to be reproduced and the ambient noise, or the sampling frequency is different. It is an object of the present invention to provide a signal processing device capable of improving the intelligibility even when they are different.

上記の目的を達成するために、この発明は、第１の周波数範囲に帯域制限された入力信号に対して周波数特性を変化させる信号処理装置であって、集音信号に含まれる周囲雑音を抽出する周囲雑音抽出手段と、前記周囲雑音抽出手段によって抽出された周囲雑音から第２の周波数範囲の周波数特性情報を抽出する情報抽出手段と、前記情報抽出手段によって抽出された周波数特性情報に対して、前記第１の周波数範囲へ周波数特性情報を周波数方向に拡張する周波数特性情報拡張手段と、前記周波数特性情報拡張手段によって得られた周波数特性情報に応じて、前記入力信号の周波数特性を変化させる信号補正手段と、を具備して構成するようにした。 In order to achieve the above object, the present invention provides a signal processing device that changes frequency characteristics for an input signal that is band-limited to a first frequency range, and extracts ambient noise contained in a collected sound signal. An ambient noise extracting means, an information extracting means for extracting frequency characteristic information in a second frequency range from the ambient noise extracted by the ambient noise extracting means, and a frequency characteristic information extracted by the information extracting means. , Frequency characteristic information extending means for extending frequency characteristic information in the frequency direction to the first frequency range, and changing the frequency characteristic of the input signal according to the frequency characteristic information obtained by the frequency characteristic information expanding means. And a signal correcting means.

本発明によれば、再生する目的信号と周囲雑音で、制限される周波数帯域が異なっているために信号成分が存在する周波数帯域が異なっていたり、サンプリング周波数が異なっていたりする場合でも、明瞭度を向上させることが可能な信号処理装置を提供することができる。 According to the present invention, even if the target signal to be reproduced and the ambient noise are different from each other in the frequency band in which the signal component exists because the restricted frequency band is different or the sampling frequency is different, It is possible to provide a signal processing device capable of improving the performance.

この発明に係わる信号処理装置の第１の実施例を適用した通信装置の構成を示す回路ブロック図。1 is a circuit block diagram showing a configuration of a communication apparatus to which a first embodiment of a signal processing apparatus according to the present invention is applied. この発明に係わる信号処理部の第１の実施例の構成を示す回路ブロック図。The circuit block diagram which shows the structure of the 1st Example of the signal processing part concerning this invention. 図２に示した信号処理部の周囲雑音推定部の構成例を示す回路ブロック図。The circuit block diagram which shows the structural example of the ambient noise estimation part of the signal processing part shown in FIG. 図２に示した信号処理部の周囲雑音情報帯域拡張部の構成例を示す回路ブロック図。The circuit block diagram which shows the structural example of the ambient noise information band expansion part of the signal processing part shown in FIG. 図４に示した周囲雑音情報帯域拡張部の辞書格納部における辞書の生成方法の動作を説明するための処理フロー図。FIG. 5 is a processing flowchart for explaining the operation of the dictionary generation method in the dictionary storage unit of the ambient noise information band extending unit shown in FIG. 4. 図２に示した信号処理部の信号特性補正部の構成例を示す回路ブロック図。The circuit block diagram which shows the structural example of the signal characteristic correction | amendment part of the signal processing part shown in FIG. この発明に係わる信号処理装置の第１の実施例を適用した通信装置およびディジタルオーディオプレイヤの構成を示す回路ブロック図。1 is a circuit block diagram showing a configuration of a communication apparatus and a digital audio player to which a first embodiment of a signal processing apparatus according to the present invention is applied. この発明に係わる信号処理部の変形例１の構成を示す回路ブロック図。The circuit block diagram which shows the structure of the modification 1 of the signal processing part concerning this invention. 図８に示した信号処理部の周囲雑音情報帯域拡張部の構成例を示す回路ブロック図。FIG. 9 is a circuit block diagram illustrating a configuration example of an ambient noise information band extending unit of the signal processing unit illustrated in FIG. 8. 図９に示した周囲雑音情報帯域拡張部の辞書格納部における辞書の生成方法の動作を説明するための処理フロー図。FIG. 10 is a processing flowchart for explaining the operation of the dictionary generation method in the dictionary storage unit of the ambient noise information band extension unit shown in FIG. 9. 広帯域マスキング閾値の例を示す図。The figure which shows the example of a broadband masking threshold value. 図８に示した信号処理部の信号特性補正部の構成例を示す回路ブロック図。FIG. 9 is a circuit block diagram illustrating a configuration example of a signal characteristic correction unit of the signal processing unit illustrated in FIG. 8. この発明に係わる信号処理部の変形例３の構成を示す回路ブロック図。The circuit block diagram which shows the structure of the modification 3 of the signal processing part concerning this invention. 図１３に示した信号処理装置の周囲雑音情報帯域拡張部の構成例を示す回路ブロック図。FIG. 14 is a circuit block diagram illustrating a configuration example of an ambient noise information band extending unit of the signal processing device illustrated in FIG. 13. 図１４に示した周囲雑音情報帯域拡張部の辞書格納部における辞書の生成方法の動作を説明するための処理フロー図。FIG. 15 is a process flow diagram for explaining the operation of the dictionary generation method in the dictionary storage unit of the ambient noise information band extension unit shown in FIG. 14. 図１４に示した周囲雑音情報帯域拡張部の閾値補正部の動作を説明するための例を示す図。The figure which shows the example for demonstrating operation | movement of the threshold value correction | amendment part of the ambient noise information band expansion part shown in FIG. 図１４に示した周囲雑音情報帯域拡張部の辞書格納部における辞書の他の生成方法の動作を説明するための処理フロー図。FIG. 15 is a process flow diagram for explaining the operation of another dictionary generation method in the dictionary storage unit of the ambient noise information band extension unit shown in FIG. 14. 図１４に示した周囲雑音情報帯域拡張部の辞書格納部における辞書の他の生成方法の動作を説明するための処理フロー図。FIG. 15 is a process flow diagram for explaining the operation of another dictionary generation method in the dictionary storage unit of the ambient noise information band extension unit shown in FIG. 14. この発明に係わる信号処理部の第２の実施例を適用した通信装置およびディジタルオーディオプレイヤの構成を示す回路ブロック図。The circuit block diagram which shows the structure of the communication apparatus and digital audio player to which the 2nd Example of the signal processing part concerning this invention is applied. この発明に係わる信号処理装置の第２の実施例の構成を示す回路ブロック図。The circuit block diagram which shows the structure of the 2nd Example of the signal processing apparatus concerning this invention. 図２０に示した信号処理装置の周囲雑音推定部と周囲雑音抑圧処理部の構成例を示す回路ブロック図。The circuit block diagram which shows the structural example of the ambient noise estimation part of a signal processing apparatus shown in FIG. 20, and an ambient noise suppression process part.

以下、図面を参照して、この発明の実施形態について説明する。
（第１の実施例）
図１は、この発明の一実施形態である通信装置の構成を示すものである。この図に示す通信装置は、例えば携帯電話などの無線通信装置の受信系を示すものであって、無線通信部１と、デコーダ２と、信号処理部３と、ディジタル・アナログ（D/A）変換器４と、スピーカ５と、マイク６と、アナログ・ディジタル（A/D）変換器７と、ダウンサンプリング部８と、エコー抑圧処理部９と、エンコーダ１０とを備えている。本実施形態では、再生する目的信号は、受信した入力信号に含まれる遠端話者の音声信号であるとして説明する。 Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
FIG. 1 shows a configuration of a communication apparatus according to an embodiment of the present invention. The communication apparatus shown in this figure shows a reception system of a wireless communication apparatus such as a cellular phone, for example, and includes a wireless communication unit 1, a decoder 2, a signal processing unit 3, and a digital / analog (D / A). A converter 4, a speaker 5, a microphone 6, an analog / digital (A / D) converter 7, a downsampling unit 8, an echo suppression processing unit 9, and an encoder 10 are provided. In the present embodiment, description will be made assuming that the target signal to be reproduced is a voice signal of the far-end speaker included in the received input signal.

無線通信部１は、移動通信網に収容される無線基地局と無線通信し、そしてこの無線基地局および移動通信網を通じて通信相手局との間に通信リンクを確立して通信する。 The wireless communication unit 1 wirelessly communicates with a wireless base station accommodated in a mobile communication network, and establishes a communication link with the communication partner station through the wireless base station and the mobile communication network.

デコーダ２は、無線通信部１が通信相手局から受信した受信データを、事前に決められた時間単位である1フレーム（=20[ms]）ごとに復号して、ディジタルの入力信号x[n] (n=0,1,…2N-1)を得て、フレーム単位で信号処理部３に出力する。ただし、この入力信号x[n]は、サンプリング周波数はfs’[Hz]でfs_wb_low[Hz]からfs_wb_high[Hz]までに帯域制限された広帯域の信号である。ここでは、後述する集音信号z[n]のサンプリング周波数fs[Hz]との関係を、fs’=2fsとする。また、サンプリング周波数fs’[Hz]のときの１フレームのデータ長は2Nサンプルする。つまり、N=20[ms]×fs[Hz]÷1000とする。 The decoder 2 decodes the reception data received by the wireless communication unit 1 from the communication partner station for each frame (= 20 [ms]) that is a predetermined time unit, and outputs a digital input signal x [n ] (n = 0, 1,... 2N−1) are obtained and output to the signal processing unit 3 in units of frames. However, the input signal x [n] is a wideband signal whose sampling frequency is fs ′ [Hz] and band-limited from fs_wb_low [Hz] to fs_wb_high [Hz]. Here, the relationship between the sound collection signal z [n], which will be described later, and the sampling frequency fs [Hz] is fs ′ = 2fs. The data length of one frame at the sampling frequency fs' [Hz] is 2N samples. That is, N = 20 [ms] × fs [Hz] ÷ 1000.

信号処理部３は、後述するエコー抑圧処理部８においてエコー低減された集音信号z[n] (n=0,1,…N-1)に応じて、１フレーム単位で入力信号x[n] (n=0,1,…2N-1)に対して信号補正処理を施し、音量または周波数特性を変化させて、その出力信号をy[n] (n=0,1,…2N-1)としてD/A変換器４とダウンサンプリング部８に出力する。なお、信号処理部３の具体的な構成例については後に詳述する。 The signal processing unit 3 inputs the input signal x [n in units of one frame in accordance with the collected sound signal z [n] (n = 0, 1,... N−1) echo-reduced by the echo suppression processing unit 8 described later. ] Apply signal correction to (n = 0,1, ... 2N-1), change the volume or frequency characteristics, and change the output signal to y [n] (n = 0,1, ... 2N-1 ) To the D / A converter 4 and the downsampling unit 8. A specific configuration example of the signal processing unit 3 will be described in detail later.

D/A変換器４は、上記信号補正された出力信号y[n]をアナログ信号y(t)に変換して、スピーカ５に出力する。スピーカ５は、アナログ信号である出力信号y(t)を音響空間へ出力する。 The D / A converter 4 converts the signal-corrected output signal y [n] into an analog signal y (t) and outputs it to the speaker 5. The speaker 5 outputs an output signal y (t) that is an analog signal to the acoustic space.

マイク６は、音を集音してアナログ信号である集音信号z(t)を取得し、A/D変換器７に出力する。このアナログ信号には、近端話者の音声信号と、それ以外の周囲環境に起因するノイズ成分、出力信号y(t)と音響空間に起因するエコー成分などが混在する。例えばノイズ成分としては、電車などが出す騒音、車などのカーノイズ、人ごみでのストリートノイズなどが挙げられる。本実施形態では、通信装置として近端話者の音声信号は通信相手局との間での通信にて所望される必要な信号であるため、近端話者の音声信号以外の成分を周囲雑音として扱う。 The microphone 6 collects the sound, acquires the collected sound signal z (t) that is an analog signal, and outputs it to the A / D converter 7. This analog signal includes a near-end speaker's voice signal, noise components caused by the other surrounding environment, an output signal y (t) and an echo component caused by the acoustic space, and the like. For example, noise components include noise generated by trains, car noise such as cars, street noise in crowds, and the like. In this embodiment, since the voice signal of the near-end speaker as a communication device is a necessary signal desired in communication with the communication partner station, components other than the voice signal of the near-end talker are replaced with ambient noise. Treat as.

A/D変換器７は、アナログ信号である集音信号z(t)をディジタル信号に変換して、ディジタルの集音信号z’[n] (n=0,1,…N-1)を得て、Nサンプル単位でエコー抑圧処理部８に出力する。ただし、この集音信号z[n]は、サンプリング周波数はfs[Hz]でfs_nb_low[Hz]からfs_nb_high[Hz]までに帯域制限された狭帯域の信号である。ただし、fs_wb_low ≦ fs_nb_low < fs_nb_high < fs/2 ≦ fs_wb_high < fs’/2 を満たすものとする。 The A / D converter 7 converts the collected sound signal z (t), which is an analog signal, into a digital signal, and converts the collected sound signal z ′ [n] (n = 0, 1,... N−1) into a digital signal. Obtained and output to the echo suppression processing unit 8 in units of N samples. However, the collected sound signal z [n] is a narrow-band signal whose frequency is limited from fs_nb_low [Hz] to fs_nb_high [Hz] with a sampling frequency of fs [Hz]. However, fs_wb_low ≦ fs_nb_low <fs_nb_high <fs / 2/2 ≦ fs_wb_high <fs ′ / 2.

ダウンサンプリング部８は、出力信号y[n]をサンプリング周波数fs’[Hz]からfs[Hz]にダウンサンプリングして、fs_nb_low[Hz]からfs_nb_high[Hz]までに帯域制限した信号をy’[n] (n=0,1,…N-1)としてエコー抑圧処理部９に出力する。 The down-sampling unit 8 down-samples the output signal y [n] from the sampling frequency fs '[Hz] to fs [Hz], and the signal obtained by band-limiting from fs_nb_low [Hz] to fs_nb_high [Hz] is y' [ n] (n = 0, 1,... N−1) is output to the echo suppression processing unit 9.

エコー抑圧処理部９は、ダウンサンプリングされた出力信号y’[n] (n=0,1,…N-1)を利用して、集音信号z’[n] (n=0,1,…N-1)に含まれるエコー成分を低減する処理を行い、そのエコー低減された信号をz[n] (n=0,1,…N-1)として、信号処理部３とエンコーダ１０に出力する。ここでエコー抑圧処理部９は、例えば、特登４０４７８６７号公報や、特開２００６−２０３３５８号公報や、特開２００７−６０６４４号公報などに記載される既存の技術で実施してよい。 The echo suppression processing unit 9 uses the downsampled output signal y ′ [n] (n = 0, 1,... N−1) to collect the sound collection signal z ′ [n] (n = 0,1, ... (N-1) is processed to reduce the echo component, and the echo-reduced signal is set as z [n] (n = 0, 1,... N-1) to the signal processing unit 3 and the encoder 10. Output. Here, the echo suppression processing unit 9 may be implemented by an existing technique described in, for example, Japanese Patent No. 4047867, Japanese Patent Application Laid-Open No. 2006-203358, Japanese Patent Application Laid-Open No. 2007-60644, or the like.

エンコーダ１０は、エコー抑圧処理部８においてエコー低減された集音信号z[n] (n=0,1,…N-1)をNサンプルごとに符号化して無線通信部１に出力し、無線通信部１によって送信データとして通信相手局へ送信される。 The encoder 10 encodes the collected sound signal z [n] (n = 0, 1,... N−1) echo-reduced by the echo suppression processing unit 8 every N samples, and outputs the encoded signal to the wireless communication unit 1. The communication unit 1 transmits the transmission data to the communication partner station.

次に、信号処理部３の実施例について説明する。以下の説明では、例えば、fs=8000[Hz]、fs’=16000[Hz]、fs_nb_low=340[Hz]、fs_nb_high=3950[Hz]、fs_wb_low=50[Hz]、fs_wb_high=7950[Hz]とする。帯域制限の周波数帯域やサンプリング周波数については、これに限らない。また、ここではN=160とする。 Next, an embodiment of the signal processing unit 3 will be described. In the following description, for example, fs = 8000 [Hz], fs' = 16000 [Hz], fs_nb_low = 340 [Hz], fs_nb_high = 3950 [Hz], fs_wb_low = 50 [Hz], fs_wb_high = 7950 [Hz] To do. The frequency band of the band limitation and the sampling frequency are not limited to this. Here, N = 160.

図２は、信号処理部３の構成例を示すものである。信号処理部３は、周囲雑音推定部３１と、周囲雑音情報帯域拡張部３２と、信号特性補正部３３とを備える。これらは、１つのプロセッサと、図示しない記憶媒体に記録されたソフトウェアによって実現することも可能である。 FIG. 2 shows a configuration example of the signal processing unit 3. The signal processing unit 3 includes an ambient noise estimation unit 31, an ambient noise information band extension unit 32, and a signal characteristic correction unit 33. These can also be realized by one processor and software recorded in a storage medium (not shown).

周囲雑音推定部３１は、エコー抑圧処理部８においてエコー低減された信号から近端話者の音声信号以外の信号を周囲雑音と推定し、この周囲雑音を特徴付ける特徴量を抽出する。なお、集音信号z[n]が狭帯域の信号であるため、周囲雑音も狭帯域の信号である。そこで、周囲雑音を特徴付ける特徴量を、狭帯域信号情報と称する。狭帯域信号情報は、パワースペクトル、振幅スペクトルや位相スペクトル、PARCOR係数や反射係数、線スペクトル周波数、ケプストラム係数、メルケプストラム係数など、周囲雑音を特徴付ける特徴量であればどのようなものでも構わない。 The ambient noise estimation unit 31 estimates a signal other than the speech signal of the near-end speaker from the signal subjected to echo reduction by the echo suppression processing unit 8 as ambient noise, and extracts a feature amount characterizing the ambient noise. Since the collected sound signal z [n] is a narrow band signal, the ambient noise is also a narrow band signal. Therefore, the feature quantity that characterizes the ambient noise is referred to as narrowband signal information. The narrowband signal information may be any characteristic amount that characterizes ambient noise, such as a power spectrum, an amplitude spectrum, a phase spectrum, a PARCOR coefficient, a reflection coefficient, a line spectrum frequency, a cepstrum coefficient, and a mel cepstrum coefficient.

周囲雑音情報帯域拡張部３２は、狭帯域信号情報を用いて、周囲雑音を入力信号x[n]の周波数帯域と同じ周波数帯域（広帯域）に拡張した場合にこの周囲雑音を特徴付ける特徴量を推定する。この特徴量を、広帯域信号情報と称する。 The ambient noise information band extension unit 32 uses narrowband signal information to estimate a feature quantity that characterizes the ambient noise when the ambient noise is extended to the same frequency band (broadband) as the frequency band of the input signal x [n]. To do. This feature amount is referred to as broadband signal information.

信号特性補正部３３は、周囲雑音情報帯域拡張部３２を用いて、目的信号の信号特性を補正する。 The signal characteristic correction unit 33 corrects the signal characteristic of the target signal using the ambient noise information band extension unit 32.

このように、周囲雑音が狭帯域の信号であっても、広帯域に拡張した場合の特徴量を推定することによって、信号特性補正部３３での補正処理によって明瞭度を向上させることができる。 As described above, even if the ambient noise is a narrow-band signal, the intelligibility can be improved by the correction process in the signal characteristic correction unit 33 by estimating the feature amount when the signal is extended to a wide band.

以下の説明では、信号処理部３の具体的な構成について説明する。なお、以下の説明では、狭帯域信号情報は周囲雑音のパワースペクトル、広帯域信号情報は周囲雑音を広帯域の信号に拡張した場合のパワー値（広帯域パワー値）であるとして説明する。 In the following description, a specific configuration of the signal processing unit 3 will be described. In the following description, it is assumed that the narrowband signal information is a power spectrum of ambient noise, and the broadband signal information is a power value (wideband power value) when the ambient noise is expanded to a broadband signal.

図３に周囲雑音推定部３１の構成例を示す。周囲雑音推定部３１は、周波数領域変換部３１１と、パワー算出部３１２と、周囲雑音区間判定部３１３と、周波数スペクトル更新部３１４とを備える。 FIG. 3 shows a configuration example of the ambient noise estimation unit 31. The ambient noise estimation unit 31 includes a frequency domain conversion unit 311, a power calculation unit 312, an ambient noise section determination unit 313, and a frequency spectrum update unit 314.

周囲雑音推定部３１は、エコー抑圧処理部８においてエコー低減された集音信号z[n] (n=0,1,…N-1)から近端話者の音声信号以外である周囲雑音を推定してこの信号のパワースペクトル|N[f,w]|² を抽出して、周囲雑音情報帯域拡張部３２へ出力する。 The ambient noise estimation unit 31 detects ambient noise other than the near-end speaker's voice signal from the collected sound signal z [n] (n = 0, 1,... N−1) echo-reduced by the echo suppression processing unit 8. Then, the power spectrum | N [f, w] | ² of this signal is extracted and output to the ambient noise information band extension unit 32.

周波数領域変換部３１１は、現在のフレームｆの集音信号z[n] (n=0,1,…N-1)が入力される。そして、このフレームの１フレーム前の集音信号から窓掛けによるオーバーラップサンプル数分のサンプルを抽出し、現在のフレームの入力信号と時間方向に結合し、適宜零詰めなどを行って、周波数領域変換に必要なサンプル分の信号を取り出す。次のフレームでの集音信号z[n]のシフト幅と集音信号z[n]のデータ長の比であるオーバーラップは５０%である場合が考えられるが、ここでは例として、１フレーム前とのオーバーラップの
サンプル数をL=48として、１フレーム前の集音信号Ｌサンプルと当該フレームの集音信号z[n]のN=160サンプル分とLサンプル分の零詰めから、２Ｍ=256サンプルを用意するとする。この２Mサンプルに対して正弦波窓による窓関数を乗じることで窓掛けを行う。そして、窓掛けを行った２Ｍサンプルの信号に対して、周波数領域変換を行う。周波数領域への変換は、例えばＦＦＴの次数を2MとしＦＦＴによって行うことができる。なお、周波数領域変換を施す信号に零詰めすることによってデータ長を２のべき乗（２Ｍ）にし、周波数領域変換の次数を２のべき乗（２Ｍ）にするとしたが、周波数領域変換の次数はこれに限らない。 The frequency domain transform unit 311 receives the sound collection signal z [n] (n = 0, 1,... N−1) of the current frame f. Then, samples corresponding to the number of overlap samples by windowing are extracted from the collected sound signal one frame before this frame, combined with the input signal of the current frame in the time direction, appropriately zeroed, etc. Extract the sample signal required for conversion. The overlap that is the ratio of the shift width of the sound collection signal z [n] and the data length of the sound collection signal z [n] in the next frame may be 50%. The number of overlapped samples with the previous one is L = 48, and N = 160 samples and L samples of the sound collection signal z [n] of the previous frame and the sound collection signal z [n] of the corresponding frame are zero-padded to 2M. Suppose you prepare = 256 samples. Windowing is performed by multiplying the 2M sample by a window function based on a sine wave window. Then, frequency domain transformation is performed on the windowed 2M sample signal. The conversion to the frequency domain can be performed by FFT, for example, with an FFT order of 2M. Note that the data length is made to be a power of 2 (2M) by zero padding the signal to be subjected to the frequency domain transformation, and the order of the frequency domain transformation is made a power of 2 (2M). Not exclusively.

集音信号z[n]が実信号である場合は、周波数領域変換を施して得られた信号から冗長なＭ=128ビンを除くと、周波数スペクトルZ[f,w] (w=0,1,…M-1)が得られる、これを出力する。ただし、ωは、周波数ビンを表す。なお、実信号のとき冗長なのは本来Ｍ-1 (=127)ビンであり、最高域の周波数ビンw=Ｍ (=128)を考慮するべきである。しかしながら、ここで周波数領域変換する信号は、帯域制限された音声信号を含むディジタル信号を前提としており、帯域制限によって最高域の周波数ビンw=Mを考慮しなくても音質に影響を及ぼさない。そこで、これ以降説明の簡略化のために、最高域の周波数ビンw=Mを考慮しない記述にする。勿論、最高域の周波数ビンw=Mを考慮しても構わない。その際、最高域の周波数ビンw=Mは、w=M-1と同等に扱うか、単独で扱うようにする。 When the collected signal z [n] is a real signal, the frequency spectrum Z [f, w] (w = 0,1) is obtained by removing redundant M = 128 bins from the signal obtained by performing frequency domain transformation. , ... M-1) is obtained and output. Here, ω represents a frequency bin. It should be noted that it is M-1 (= 127) bins that are redundant for real signals, and the highest frequency bin w = M (= 128) should be considered. However, the signal subjected to frequency domain conversion here is premised on a digital signal including a band-limited audio signal, and does not affect the sound quality even if the highest frequency bin w = M is not considered due to the band limitation. Therefore, in order to simplify the description, the description is made not to consider the highest frequency bin w = M. Of course, the highest frequency bin w = M may be considered. At this time, the highest frequency bin w = M is handled in the same way as w = M−1 or handled independently.

なお、窓掛けに用いる窓関数は、ハミング窓に限定せず、他の対称窓（ハニング窓、ブラックマン窓、正弦波窓など）あるいは音声符号化処理で用いられるような非対称窓などに適宜変更してよい。また、周波数領域変換は、DFT(Discrete Fourier Transform)や離
散コサイン変換（ＤＣＴ: Discrete Cosine Transform）などの周波数領域に変換する他の直交変換を代用することも可能である。 Note that the window function used for windowing is not limited to the Hamming window, but is appropriately changed to another symmetric window (Hanning window, Blackman window, sine wave window, etc.) or an asymmetric window used in speech coding processing. You can do it. The frequency domain transform can be replaced with other orthogonal transforms that transform into the frequency domain, such as DFT (Discrete Fourier Transform) and Discrete Cosine Transform (DCT).

パワー算出部３１２は、周波数領域変換部３１１から出力された周波数スペクトルZ[f,w] (w=0,1,…M-1)における実部と虚部の２乗和であるパワースペクトル|Z[f,w]|² (w=0,1,…M-1)を算出して出力する。 The power calculation unit 312 is a power spectrum that is a sum of squares of a real part and an imaginary part in the frequency spectrum Z [f, w] (w = 0, 1,... M−1) output from the frequency domain conversion unit 311. Z [f, w] | ² (w = 0,1,... M-1) is calculated and output.

周囲雑音区間判定部３１３は、集音信号z[n] (n=0,1,…N-1)と、パワー算出部３１２から出力されるパワースペクトル|Z[f,w]|² (w=0,1,…M-1)と、周波数スペクトル更新部３１４から出力される１フレーム前の各周波数帯域の周囲雑音のパワースペクトル|N[f-1,w]|² を用いて、集音信号z[n]に周囲雑音が支配的に含まれている区間（周囲雑音区間）であるか、周囲雑音には含まない近端話者の音声信号と周囲雑音が混在している区間（音声区間）のどちらであるかの判別をフレーム毎に行い、フレーム毎に判定結果を表すフレーム判定情報vad[f]を出力する。ここでは、周囲雑音区間であるときフレーム判定情報vad[f]=0とし、音声区間であるときvad[f]=1とする。なおこれ以降、当該成分のみしか存在しないか、あるいは当該成分が他の成分よりも非常に多く含まれる場合（所定の閾値以上含まれる場合）を「支配的に含まれる」と表現する。 The ambient noise section determination unit 313 includes the collected sound signal z [n] (n = 0, 1,... N−1) and the power spectrum | Z [f, w] | ² (w = 0,1,... M−1) and the power spectrum | N [f−1, w] | ² of the ambient noise of each frequency band one frame before output from the frequency spectrum update unit 314 A section where ambient noise is dominantly included in the sound signal z [n] (ambient noise section), or a section where the near-end speaker's voice signal and ambient noise are not included in the ambient noise ( Is determined for each frame, and frame determination information vad [f] representing a determination result is output for each frame. Here, frame determination information vad [f] = 0 is set for the ambient noise section, and vad [f] = 1 is set for the voice section. In the following description, a case where only the component exists or a case where the component is included in a larger amount than other components (a case where the component is included in a predetermined threshold or more) is expressed as “dominantly included”.

具体的には、集音信号z[n] (n=0,1,…N-1)とパワースペクトル|Z[f,w]|² と１フレーム前の周囲雑音のパワースペクトル|N[f-1,w]|² を用いて、複数の特徴量を算出し、フレーム判定情報vad[f]を出力する。ここでは複数の特徴量として、１次自己相関係数Acorr[f,1]、自己相関係数最大値Acorr_max[f]、周波数別ＳＮ比総和snr_sum[f]、周波数別ＳＮ比分散snr_var[f]を例に挙げて説明する。 Specifically, the collected sound signal z [n] (n = 0, 1,... N-1), the power spectrum | Z [f, w] | ² and the power spectrum of ambient noise one frame before | N [f −1, w] | ² , a plurality of feature amounts are calculated, and frame determination information vad [f] is output. Here, as a plurality of feature quantities, primary autocorrelation coefficient Acorr [f, 1], autocorrelation coefficient maximum value Acorr_max [f], SN ratio sum by frequency snr_sum [f], SN ratio variance by frequency snr_var [f ] As an example.

まず、式（１）に示すように、フレーム単位でのパワーで正規化されて絶対値をとったk次自己相関係数Acorr[f,k] (k=1,…N-1)を計算する。

このとき併せて、k=1である１次自己相関係数Acorr[f,1]も計算する。１次自己相関係数Acorr[f,1]は０から１の値をとり、０に近づくほどノイズ性が強い。つまり、１次自己相関係数の値が小さいほど、集音信号に周囲雑音が多く含まれ、周囲雑音には含まない音声信号が少ないと判断される。そして、正規化されたk次自己相関係数Acorr[f,k] (k=1,…N-1)から式（２）に示すように、最大となる自己相関係数Acorr[f,k]を計算して、自己相関係数最大値Acorr_max[f]とする。自己相関係数最大値Acorr_max[f]は０から１の値をとり、０に近づくほどノイズ性が強い。つまり、自己相関係数の値が小さいほど、集音信号に周囲雑音が多く含まれ、周囲雑音には含まない音声信号が少ないと判断される。

次に、パワースペクトル|Z[f,w]|²と周囲雑音のパワースペクトル|N[f,w]|²とを入力として、それらの比である各周波数帯域のＳＮ比を、ここではｄＢ表現したsnr[f,w] (w=0,1,…M-1)として式（３）で算出する。

そして、各周波数帯域のＳＮ比snr[f,w] (w=0,1,…M-1)の和を式（４）で算出し、周波数別ＳＮ比総和値snr_sum[f]とする。周波数別ＳＮ比総和値snr_sum[f]は０以上の値をとり、この値が小さいほど集音信号中にノイズ成分である周囲雑音が多く含まれ、周囲雑音には含まない音声信号が少ないと判断される。

また、各周波数帯域のＳＮ比snr[f,w] (w=0,1,…M-1)の分散を式（５）で算出し、周波数別ＳＮ比分散値snr_var[f]とする。周波数別ＳＮ比分散値snr_var[f]は０以上の値をとり、この値が小さいほどノイズ成分である周囲雑音が多く含まれ、周囲雑音には含まない音声信号が少ないと判断される。

最後に、複数の特徴量である、１次自己相関係数Acorr[f,１]、自己相関係数最大値Acorr_max[f]、周波数別ＳＮ比総和値snr_sum[f]、周波数別ＳＮ比分散値snr_var[f]を用いて、これらにそれぞれ所定の重み付けによる重み付けを行い、複数の特徴量の重み付け和として周囲雑音度合type[f]を算出する。ここでは、周囲雑音度合type[f]が小さいほど周囲雑音が支配的であるとし、大きいほど周囲雑音には含まない音声信号が支配的であるとしているので、例えば、線形識別関数による判定を用いた学習アルゴリズムなどで重みw_1、w₂、w₃、w₄（ただしw₁≧０、w₂≧０、w₃≧０、w₄≧０）を設定して、式（６）で算出する。そして、周囲雑音度合type[f]が所定の閾値THRよりも大きければvad[f]=1とし、周囲雑音度合type[f]が所定の閾値THR以下であればvad[f]=0とする。

以上の説明では、複数の特徴量を求める際に、周波数ビンごとに処理するとして説明したが、周波数領域変換による隣接する複数の周波数ビンをまとめてグループを作り、そのグループ単位で処理を行っても構わない。また、フィルタバンクなどの帯域分割フィルタなどの周波数領域変換によって実現してもよい。 First, as shown in Equation (1), the k-th order autocorrelation coefficient Acorr [f, k] (k = 1,... N-1) is calculated by taking the absolute value by normalizing with the power in units of frames. To do.

At the same time, the first-order autocorrelation coefficient Acorr [f, 1] with k = 1 is also calculated. The primary autocorrelation coefficient Acorr [f, 1] takes a value from 0 to 1, and the closer to 0, the stronger the noise characteristic. That is, it is determined that the smaller the value of the primary autocorrelation coefficient, the more ambient noise is included in the collected sound signal and the fewer audio signals are not included in the ambient noise. Then, from the normalized k-th order autocorrelation coefficient Acorr [f, k] (k = 1,... N−1), as shown in the equation (2), the maximum autocorrelation coefficient Acorr [f, k ] Is calculated as the autocorrelation coefficient maximum value Acorr_max [f]. The autocorrelation coefficient maximum value Acorr_max [f] takes a value from 0 to 1, and the closer to 0, the stronger the noise characteristic. That is, it is determined that the smaller the value of the autocorrelation coefficient, the more ambient noise is included in the collected sound signal, and the fewer audio signals are not included in the ambient noise.

Next, the power spectrum | Z [f, w] | ² and the power spectrum | N [f, w] | ^{2 of the} ambient noise are input, and the SN ratio of each frequency band, which is the ratio thereof, is expressed here as dB. It is calculated by the expression (3) as expressed snr [f, w] (w = 0, 1,... M−1).

Then, the sum of the SN ratios snr [f, w] (w = 0, 1,... M−1) of each frequency band is calculated by Expression (4), and is set as the frequency-specific SN ratio sum value snr_sum [f]. The SN ratio total value snr_sum [f] for each frequency takes a value of 0 or more. The smaller this value, the more ambient noise that is a noise component is included in the collected sound signal, and the less the audio signal is not included in the ambient noise. To be judged.

Further, the variance of the SN ratio snr [f, w] (w = 0, 1,... M-1) of each frequency band is calculated by the equation (5), and is set as the frequency-specific SN ratio variance value snr_var [f]. The frequency-specific SN ratio variance value snr_var [f] takes a value of 0 or more, and it is determined that the smaller this value is, the more ambient noise that is a noise component is included, and the fewer audio signals are not included in the ambient noise.

Finally, there are a plurality of feature quantities, such as primary autocorrelation coefficient Acorr [f, 1], autocorrelation coefficient maximum value Acorr_max [f], frequency-specific SN ratio sum value snr_sum [f], frequency-specific SN ratio variance Using the value snr_var [f], each of these is weighted with a predetermined weight, and the ambient noise degree type [f] is calculated as a weighted sum of a plurality of feature amounts. Here, it is assumed that the ambient noise is dominant as the ambient noise degree type [f] is small, and the voice signal that is not included in the ambient noise is dominant as the magnitude is large. such as in the stomach learning algorithm by setting the weight _{_{_{w 1, w 2, w 3}}} , w 4 ( provided that _{_{w 1 ≧ 0, w 2 ≧}} 0, w 3 ≧ 0, w 4 ≧ 0), calculated by the formula (6) To do. If the ambient noise level type [f] is larger than the predetermined threshold value THR, vad [f] = 1 is set. If the ambient noise level type [f] is equal to or lower than the predetermined threshold value THR, vad [f] = 0 is set. .

In the above description, when obtaining a plurality of feature amounts, it has been described that processing is performed for each frequency bin. However, a plurality of adjacent frequency bins by frequency domain transformation are grouped together to perform processing in units of groups. It doesn't matter. Further, it may be realized by frequency domain transformation such as a band division filter such as a filter bank.

なお、前述した複数の特徴量を全て使わなくてもよいし、他の特徴量を追加して用いてもよい。また、無線通信部１あるいはデコーダ２から出力されるコーデック情報、例えば、無音挿入記述子（ＳＩＤ）や音声検出器（ＶＡＤ）による音声であるか音声でないかを表す音声検出情報や擬似背景雑音を生成したかどうかの情報などを用いてもよい。 Note that it is not necessary to use all of the plurality of feature amounts described above, or other feature amounts may be added and used. In addition, codec information output from the wireless communication unit 1 or the decoder 2, for example, voice detection information or pseudo background noise indicating whether the voice is generated by a silence insertion descriptor (SID) or a voice detector (VAD) or not. Information on whether or not it has been generated may be used.

周波数スペクトル更新部３１４は、周囲雑音区間判定部３１３から出力されるフレーム判定情報vad[f]と、パワー算出部３１２から出力されるパワースペクトル|Z[f,w]|² (w=0,1,…M-1)を用いて、各周波数帯域の周囲雑音のパワースペクトルである|N[f,w]|² (w=0,1,…M-1)を推定して出力する。例えば、フレーム判定情報vad[f]を0として周囲雑音が支配的に含まれる区間（周囲雑音区間）であると判別されたフレームのパワースペクトル|Z[f,w]|² をフレーム単位で忘却させて平均的なパワースペクトルを算出し、これを各周波数帯域の周囲雑音のパワースペクトル|N[f,w]|² (w=0,1,…M-1)として出力する。なお、具体的には、各周波数帯域の周囲雑音のパワースペクトル|N[f,w]|² の算出は、式（８）に示すように１フレーム前の各周波数帯域の周囲雑音のパワースペクトル|N[f-1,w]|² を用いて再帰的に行う。ただし、式（７）の忘却係数α_N[ω]は１以下の係数であって、好ましくは０．７５〜０．９５程度である。

周囲雑音情報帯域拡張部３２は、各周波数帯域の周囲雑音のパワースペクトル|N[f,w]|² を用いて、入力信号x[n]には存在して集音信号z[n]には存在しない周波数帯域成分を含めた信号のパワー値を生成する。 The frequency spectrum update unit 314 includes frame determination information vad [f] output from the ambient noise section determination unit 313 and power spectrum | Z [f, w] | ² (w = 0, 1,... M−1) is used to estimate and output | N [f, w] | ² (w = 0,1,... M−1), which is the power spectrum of ambient noise in each frequency band. For example, forgetting the power spectrum | Z [f, w] | ² of a frame that is determined to be a section (ambient noise section) in which ambient determination noise is dominantly included with frame determination information vad [f] set to 0 Thus, an average power spectrum is calculated and output as an ambient noise power spectrum | N [f, w] | ² (w = 0, 1,... M−1) of each frequency band. Specifically, the power spectrum of the ambient noise in each frequency band | N [f, w] | ² is calculated as shown in the equation (8). | N [f-1, w ] | recursively carried out using ^2. However, the forgetting factor α _N [ω] in the equation (7) is a coefficient of 1 or less, and preferably about 0.75 to 0.95.

The ambient noise information band extending unit 32 uses the ambient noise power spectrum | N [f, w] | ² of each frequency band to be present in the input signal x [n] and converted into the collected sound signal z [n]. Generates a power value of a signal including a non-existent frequency band component.

図４は周囲雑音情報帯域拡張部３２の構成例を示す図である。周囲雑音情報帯域拡張部３２は、パワー正規化部３２１と、辞書格納部３２２と、広帯域パワー算出部３２３とを備える。 FIG. 4 is a diagram illustrating a configuration example of the ambient noise information band extending unit 32. The ambient noise information band extension unit 32 includes a power normalization unit 321, a dictionary storage unit 322, and a broadband power calculation unit 323.

周囲雑音情報帯域拡張部３２では、狭帯域信号情報から狭帯域特徴量データを算出し、狭帯域信号情報から算出される狭帯域特徴量データと広帯域特徴量データとの対応を事前にモデル化しておき、このモデルと取得した狭帯域特徴量データとの対応を用いて広帯域特徴量データを算出し、広帯域特徴量データから広帯域信号情報を生成する。前述のとおり、ここでは、狭帯域信号情報は周囲雑音のパワースペクトルである。またここでは、広帯域特徴量データと広帯域信号情報は同じであるとし、広帯域信号情報は広帯域パワー値N_wb_level[f]で示される音量である。狭帯域特徴量データと広帯域特徴量データとの対応のモデル化には、GMM(Gaussian mixture model)を利用する手法を用いる。ここでは、狭帯域パワー値Pow_N[f]と周囲雑音の正規化したパワースペクトル|Nn[f,w]|² (w=0,1,…M-1)を次数方向に連結してDnb次の狭帯域特徴量データとして用い、広帯域パワー値N_wb_level[f]をDwb次の広帯域特徴量データとして用いる（Dnb=M+1、Dwb=1）。 The ambient noise information band extension unit 32 calculates narrowband feature data from the narrowband signal information, and models in advance the correspondence between the narrowband feature data calculated from the narrowband signal information and the broadband feature data. Wideband feature data is calculated using the correspondence between this model and the acquired narrowband feature data, and broadband signal information is generated from the wideband feature data. As described above, here, the narrowband signal information is a power spectrum of ambient noise. Here, it is assumed that the broadband feature data and broadband signal information are the same, and the broadband signal information is a volume indicated by a broadband power value N_wb_level [f]. For modeling the correspondence between narrowband feature data and wideband feature data, a technique using a GMM (Gaussian mixture model) is used. Here, the Nnb power value Pow_N [f] and the normalized power spectrum of ambient noise | Nn [f, w] | ² (w = 0,1,… M-1) are concatenated in the order direction and the Dnb order Are used as the narrowband feature value data, and the wideband power value N_wb_level [f] is used as the Dwb-order wideband feature value data (Dnb = M + 1, Dwb = 1).

まず狭帯域信号情報から狭帯域特徴量データを算出するために、パワー正規化部３２１には、周囲雑音推定部３１から出力される周囲雑音のパワースペクトル|N[f,w]|² (w=0,1,…M-1)が入力され、この周囲雑音のパワースペクトルを用いて狭帯域特徴量データを算出する。狭帯域特徴量データの１つは、式（８）に基づいて算出される、パワースペクトルの各周波数ビンの総和である狭帯域パワー値Pow_N[f]である。

また他の狭帯域特徴量データとしては、狭帯域パワー値Pow_N[f]を用いて式（９）に従って各周波数ビンのパワースペクトル|N[f,w]|²を正規化したパワースペクトル|Nn[f,w]|²を算出する。

辞書格納部３２２は、事前に集音した周囲雑音に基づいてDnb次の狭帯域特徴量データとDwb次の広帯域特徴量データとの対応をモデル化して学習された混合数Ｑ（ここではQ=64）のGMMの辞書 λ1_q={ｗ_q,μ_q,Σ_q}（q=1,…,Q）を格納する。なお、ｗ_qはq番目の混合正規分布の混合重みを示し、μ_qはq番目の混合正規分布の平均ベクトル、Σ_qはq次数目の混合正規分布の共分散行列（対角共分散行列または全共分散行列）を表している。なお、平均ベクトルμ_qと共分散行列Σ_qの成分の数である次数は、Dnb＋Dwbである。 First, in order to calculate the narrowband feature amount data from the narrowband signal information, the power normalization unit 321 includes the power spectrum of the ambient noise output from the ambient noise estimation unit 31 | N [f, w] | ² (w = 0, 1,... M−1) is input, and narrowband feature data is calculated using the power spectrum of this ambient noise. One of the narrowband feature data is a narrowband power value Pow_N [f], which is the sum of the frequency bins of the power spectrum, calculated based on Expression (8).

As another narrowband feature quantity data, the power spectrum | Nn obtained by normalizing the power spectrum | N [f, w] | ² of each frequency bin according to the equation (9) using the narrowband power value Pow_N [f]. [f, w] | ² is calculated.

The dictionary storage unit 322 is configured to model the correspondence between the Dnb-order narrowband feature data and the Dwb-order broadband feature data based on the ambient noise collected in advance, and the mixture number Q (here, Q = 64) GMM dictionary λ1 _q = {w _q , μ _q , Σ _q } (q = 1,..., Q) is stored. Here, w _q indicates the weight of the q-th mixed normal distribution, μ _q is the mean vector of the q-th mixed normal distribution, Σ _q is the covariance matrix (diagonal covariance matrix of the q-th mixed normal distribution) Or the total covariance matrix). The order, which is the number of components of the mean vector μ _q and the covariance matrix Σ _q , is Dnb + Dwb.

辞書格納部３２２における事前の辞書λ1_qの学習生成方法について、フローチャートを図５に示し、説明する。 A flowchart of the learning generation method for the dictionary λ 1 _q in the dictionary storage unit 322 will be described with reference to FIG.

GMMの生成に用いる信号は、入力信号x[n]と同様のサンプリング周波数fs’[Hz]でfs_wb_low[Hz]からfs_wb_high[Hz]までに帯域制限された広帯域な信号を別途事前に集音した信号群である。この信号群は、多数の様々な環境、様々な音量であることが望ましい。以下では、GMMの生成に用いる広帯域信号の信号群をまとめて広帯域信号データwb[n]と表記する。ｎは時刻（サンプル）を表す。 The signal used to generate the GMM was collected in advance separately from a wideband signal that was band-limited from fs_wb_low [Hz] to fs_wb_high [Hz] at the same sampling frequency fs' [Hz] as the input signal x [n]. It is a signal group. This group of signals is preferably in many different environments and different volumes. Hereinafter, a group of broadband signals used for generating the GMM is collectively referred to as broadband signal data wb [n]. n represents time (sample).

まず、広帯域信号データwb[n]を入力として、ダウンサンプリングフィルタによってサンプリング周波数fs[Hz]にダウンサンプリングし、fs_nb_low[Hz]からfs_nb_high[Hz]までの狭帯域に帯域制限された狭帯域信号データnb[n]を得る（ステップＳ１０１）。このようにして、集音信号z[n]と同じように帯域制限された信号群を生成する。なお、図示しないが、上記ダウンサンプリングフィルタや帯域制限処理でアルゴリズム遅延が生じる場合には、狭帯域信号データnb[n]を広帯域信号データwb[n]と同期を合わせる処理を行う。 First, the wideband signal data wb [n] is input, downsampled to the sampling frequency fs [Hz] by the downsampling filter, and narrowband signal data band-limited to a narrow band from fs_nb_low [Hz] to fs_nb_high [Hz] nb [n] is obtained (step S101). In this manner, a band-limited signal group is generated in the same manner as the collected sound signal z [n]. Although not shown, when an algorithm delay occurs in the downsampling filter or the band limiting process, a process for synchronizing the narrowband signal data nb [n] with the wideband signal data wb [n] is performed.

次に、上記狭帯域信号データnb[n]からフレームf単位で狭帯域特徴量データPnb[f,d]（d=1,…,Dnb）を抽出する（ステップＳ１０２）。狭帯域特徴量データPnd[f,d]は、所定の次数の狭帯域信号情報を表す特徴量データである。ステップＳ１０２ではまず、狭帯域信号データnb[n]からフレーム毎に前述の周波数領域変換部３１１における処理と同様に周波数領域変換処理を行い、M次の狭帯域信号データnb[n]のパワースペクトルを得る（ステップＳ１０２１）。次に、前述のパワー正規化部３２１における処理と同様の処理によって、狭帯域信号データnb[n]からフレーム毎にパワー算出を行い、1次のパワー値を得る（ステップＳ１０２２）。そして、これらのパワースペクトルとパワー値からM次の狭帯域信号データnb[n]の正規化されたパワースペクトルを得る（ステップＳ１０２３）。そして、M次の正規化されたパワースペクトルと1次のパワー値をフレーム単位で次数方向（次元方向）に連結して、次数Dnb（=M+1）の狭帯域特徴量データPnb[f,d]（d=1,…,Dnb）を生成する（ステップＳ１０２４）。 Next, narrowband feature data Pnb [f, d] (d = 1,..., Dnb) is extracted from the narrowband signal data nb [n] in units of frame f (step S102). The narrowband feature data Pnd [f, d] is feature data representing narrowband signal information of a predetermined order. In step S102, first, frequency domain conversion processing is performed for each frame from the narrowband signal data nb [n] in the same manner as the processing in the frequency domain conversion unit 311 described above. Is obtained (step S1021). Next, power is calculated for each frame from the narrowband signal data nb [n] by the same processing as the processing in the power normalization unit 321 described above, and a primary power value is obtained (step S1022). Then, a normalized power spectrum of the M-th order narrowband signal data nb [n] is obtained from these power spectra and power values (step S1023). Then, the Mth-order normalized power spectrum and the first-order power value are connected in the order direction (dimension direction) in units of frames, and the narrowband feature data Pnb [f, f of the order Dnb (= M + 1) d] (d = 1,..., Dnb) are generated (step S1024).

一方、上記に並行して、広帯域信号データwb[n]からフレームf単位で広帯域特徴量データPwb[f,d]（d=1,…,Dwb）を抽出する（ステップＳ１０３）。広帯域特徴量データPwb[f,d]は、所定の次数の広帯域信号情報を表す特徴量データである。ステップＳ１０３ではまず、広帯域信号データwb[n]からフレーム毎に前述の周波数領域変換部３１１における処理のＦＦＴ点数を倍の4M点にして、同様に周波数領域変換処理を行い、2M次の広帯域信号データwb[n]のパワースペクトルを得る（ステップＳ１０３１）。次に、前述のパワー正規化部３２１における処理と同様の処理によって、広帯域信号データwb[n]からフレーム毎にパワー算出を行って1次のパワー値を得る。このパワー値を次数Dwb（=1）の広帯域特徴量データPwb[f,d]とする（ステップＳ１０３２）。 On the other hand, in parallel with the above, broadband feature data Pwb [f, d] (d = 1,..., Dwb) is extracted from the broadband signal data wb [n] in units of frame f (step S103). The broadband feature data Pwb [f, d] is feature data representing broadband signal information of a predetermined order. In step S103, first, the frequency domain conversion processing is similarly performed by setting the FFT point of the processing in the frequency domain conversion unit 311 to double 4M points for each frame from the wideband signal data wb [n]. A power spectrum of the data wb [n] is obtained (step S1031). Next, the power is calculated for each frame from the wideband signal data wb [n] by the same process as the process in the power normalization unit 321 described above to obtain a primary power value. This power value is set as wideband feature data Pwb [f, d] of order Dwb (= 1) (step S1032).

次に、狭帯域特徴量データPnb[f,d]（d=1,…,Dnb）と広帯域特徴量データPwb[f,d]（d=1,…,Dwb）の時間的に同期が取れた２つの特徴量データをフレーム単位で次数方向（次元方向）に連結して、次数Dnb+Dwbの連結特徴量データP[f,d]（d=1,…,Dnb+Dwb）を生成する（ステップＳ１０４）。 Next, the narrowband feature data Pnb [f, d] (d = 1, ..., Dnb) and the broadband feature data Pwb [f, d] (d = 1, ..., Dwb) are synchronized in time. The two feature quantity data are linked in the order direction (dimension direction) in units of frames to generate the connected feature quantity data P [f, d] (d = 1,..., Dnb + Dwb) of the order Dnb + Dwb. (Step S104).

そして、連結特徴量データＰ[f,d]から混合数Q=1の初期GMMを生成し、各GMMの平均ベクトルをわずかにずらして別の混合分布を生成することで混合数Qを2倍に増やす処理と、連結特徴量データＰ[f,d]を用いてＥＭアルゴリズムにより収束するまでGMMの尤度最大化学習を行う処理とを交互に繰り返し行い、混合数Ｑ（ここではQ=64）のGMM λ1_q={ｗ_q,μ_q,Σ_q}（q=1,…,Q）を生成する（ステップＳ１０５）。EMアルゴリズムについては、D.A.Reynols and R.C.Rose,“Robust text-independent speaker identification using Gaussian mixture models”,IEEE Trans. Speech and Audio Processing, Vol.3, no.1, pp.72-83, Jan.1995. などの文献に詳細な記述がある。
図４の説明に戻る。広帯域パワー算出部３２３には、パワー正規化部３２１から出力された狭帯域パワー値Pow_N[f]と周囲雑音の正規化したDnb次のパワースペクトル|Nn[f,w]|² (w=0,1,…M-1)が連結されて、狭帯域特徴量データPn_nb[f]（d=1,…,Dnb）として入力される。また、広帯域パワー算出部３２３は、辞書格納部３２２からGMMの辞書λ1_q={ｗ_i,μ_q,Σ_q}（q=1,…,Q）を読み出して、最小平均２乗誤差（MMSE：Minimam Mean Square Error）推定に従って、式（１０）に示すように、複数の正規分布モデルによるソフトクラスタリングと連続的な線形回帰によって、周波数帯域が拡張された広帯域に対応する特徴量データへの変換を行い、狭帯域特徴量データPn_nb[f]から広帯域特徴量データである広帯域パワー値N_wb_level[f]を算出して出力する。式（１０）は、次元（d=1,…,Dnb+Dwb）方向のベクトルとして記載している。また、平均ベクトルμ_q（d=1,…,Dnb+Dwb）は次元方向で、μ_q ^N（d=1,…,Dnb）とμ_q ^W（d=Dnb,…,Dnb+Dwb）に分割し、（Dn+Dw）×（Dn+Dw）行列である共分散行列Σ_qも以下のように、Dn×Dn行列であるΣ_q ^NNとDn×Dw行列であるΣ_q ^NWとDw×Dn行列であるΣ_q ^WNとDw×Dw行列であるΣ_q ^WWとに分割する。

周囲雑音情報帯域拡張部３２では、広帯域特徴量データと広帯域信号情報は同じであるとしたため、このようにして、狭帯域信号情報である周囲雑音のパワースペクトル|N[f,w]|²から、広帯域信号情報である広帯域パワー値N_wb_level[f]が得られる。 Then, an initial GMM with a mixture number Q = 1 is generated from the connected feature data P [f, d], and the mixture vector Q is doubled by slightly shifting the average vector of each GMM to generate another mixture distribution. And the process of performing likelihood maximization learning of the GMM using the connected feature data P [f, d] until convergence by the EM algorithm is alternately repeated, and the number of mixtures Q (here, Q = 64) ) GMM λ1 _q = {w _q , μ _q , Σ _q } (q = 1,..., Q) is generated (step S105). For EM algorithm, DAReynols and RCRose, “Robust text-independent speaker identification using Gaussian mixture models”, IEEE Trans. Speech and Audio Processing, Vol.3, no.1, pp.72-83, Jan.1995. There are detailed descriptions in the literature.
Returning to the description of FIG. The broadband power calculation unit 323 includes the narrowband power value Pow_N [f] output from the power normalization unit 321 and the normalized Dnb-order power spectrum | Nn [f, w] | ² (w = 0 , 1,... M-1) are connected and input as narrowband feature data Pn_nb [f] (d = 1,..., Dnb). The broadband power calculation unit 323 reads the GMM dictionary λ1 _q = {w _i , μ _q , Σ _q } (q = 1,..., Q) from the dictionary storage unit 322, and obtains the minimum mean square error (MMSE). : Minimum Mean Square Error) According to the estimation, as shown in Equation (10), conversion to feature data corresponding to a wide band whose frequency band is expanded by soft clustering using multiple normal distribution models and continuous linear regression The broadband power value N_wb_level [f], which is broadband feature data, is calculated from the narrowband feature data Pn_nb [f] and output. Expression (10) is described as a vector in the dimension (d = 1,..., Dnb + Dwb) direction. Also, the mean vector μ _q (d = 1,…, Dnb + Dwb) is dimensional, and μ _q ^N (d = 1,…, Dnb) and μ _q ^W (d = Dnb,…, Dnb + Dwb) The covariance matrix Σ _q, which is a (Dn + Dw) × (Dn + Dw) matrix, is also divided as follows: Σ _q ^NN , which is a Dn × Dn matrix, and Σ _q ^NW , which is a Dn × Dw matrix, and Dw × Divide into Σ _q ^WN which is a Dn matrix and Σ _q ^WW which is a Dw × Dw matrix.

In the ambient noise information band extending unit 32, since the broadband feature data and the broadband signal information are the same, in this way, from the power spectrum | N [f, w] | ² of the ambient noise that is the narrowband signal information. Then, a broadband power value N_wb_level [f], which is broadband signal information, is obtained.

図６に信号特性補正部３３の構成例を示す。信号特性補正部３３は、周波数領域変換部３３１と、補正度合決定部３３２と、補正処理部３３３と、時間領域変換部３３４とを備える。信号特性補正部３３には、入力信号x[n] (n=0,1,…2N-1)と広帯域パワー値N_wb_level[f]が入力され、入力信号x[n]が集音信号に含まれる周囲雑音に埋もれてしまわないよう明瞭化する信号補正処理を行い、その補正後の出力信号y[n] (n=0,1,…2N-1)を出力する。 FIG. 6 shows a configuration example of the signal characteristic correction unit 33. The signal characteristic correction unit 33 includes a frequency domain conversion unit 331, a correction degree determination unit 332, a correction processing unit 333, and a time domain conversion unit 334. The signal characteristic correction unit 33 receives the input signal x [n] (n = 0, 1,... 2N-1) and the broadband power value N_wb_level [f], and the input signal x [n] is included in the collected sound signal. The signal is corrected so as not to be buried in the ambient noise, and the corrected output signal y [n] (n = 0, 1,... 2N-1) is output.

周波数領域変換部３３１には、周波数領域変換部３１１における集音信号z[n] (n=0,1,…N-1)の代わりに入力信号x[n] (n=0,1,…2N-1)が入力される。周波数領域変換部３３１は、周波数領域変換部３１１と同様の処理によって、入力信号x[n]の周波数スペクトルX[f,w]を出力する。例えば、周波数領域変換部３３１は、１フレーム前とのオーバーラップのサンプル数をL=96とし、１フレーム前の入力信号Lサンプルと当該フレームの入力信号x[n]の2N=320サンプル分とLサンプル分の零詰めから、4M=512サンプルを用意する。そして、この4Mサンプルに対して正弦波窓による窓関数を乗じることで窓掛けを行った信号に対して、ＦＦＴの次数を4MとしてＦＦＴによる周波数領域変換を行う。 The frequency domain converter 331 receives the input signal x [n] (n = 0, 1,... Instead of the collected sound signal z [n] (n = 0, 1,... N−1) in the frequency domain converter 311. 2N-1) is input. The frequency domain transform unit 331 outputs the frequency spectrum X [f, w] of the input signal x [n] by the same processing as the frequency domain transform unit 311. For example, the frequency domain transform unit 331 sets L = 96 as the number of samples overlapping with the previous frame, and 2N = 320 samples of the input signal L samples of the previous frame and the input signal x [n] of the frame. Prepare 4M = 512 samples from zero padding of L samples. Then, the frequency domain transform by FFT is performed on the signal obtained by multiplying the 4M sample by a window function by a sine wave window and setting the FFT order to 4M.

補正度合決定部３３２には、周囲雑音情報帯域拡張部３２から出力された広帯域パワー値N_wb_level[f]を入力される。そして、式（１１）で補正ゲインG[f,w] (w=0,1,…2M-1)を算出して出力する。

式（１１）のＮ_０は、通常の利用環境における周囲雑音のパワーを入力信号x[n]と同じサンプリング周波数・同じ帯域制限で事前に測定しておいて設定した周囲雑音の基準パワー値である。このようにすることで、通常の利用環境よりも周囲雑音のパワー値が大きい環境（すなわち、周囲雑音が多く含まれる環境）でも、補正ゲインG[f,w]をその分大きく設定することで、入力信号x[n]を明瞭化することができる。 The broadband power value N_wb_level [f] output from the ambient noise information band extending unit 32 is input to the correction degree determining unit 332. Then, the correction gain G [f, w] (w = 0, 1,... 2M−1) is calculated and output by the equation (11).

N _{0 in} Equation (11) is a reference power value of ambient noise that is set in advance by measuring the power of ambient noise in a normal usage environment with the same sampling frequency and the same band limitation as the input signal x [n]. is there. In this way, even in an environment where the power value of the ambient noise is larger than the normal usage environment (that is, an environment including a lot of ambient noise), the correction gain G [f, w] The input signal x [n] can be clarified.

補正処理部３３３には、入力信号x[n]の周波数スペクトルX[f,w] (w=0,1,…2M-1)と補正度合決定部３３２から出力された補正ゲインG[f,w] (w=0,1,…2M-1)が入力される。そして、式（１２）によって入力信号x[n]の周波数スペクトルX[f,w]を補正し、その補正結果である出力信号y[n]の周波数スペクトルY[f,w] (w=0,1,…2M-1)を出力する。

時間領域変換部３３４は、補正処理部３３３から出力された周波数スペクトルY[f,w] (w=0,1,…2M-1)に対して時間領域変換（周波数逆変換）を行って、周波数領域変換部３３１における窓掛けを考慮しオーバーラップを戻す処理を適宜行い、補正された信号である出力信号y[n]を算出する。例えば、周波数スペクトルY[f,w] (w=0,1,…2M-1)に対して、入力信号x[n]が実信号であることを考慮に入れて周波数スペクトルY[f,w]をw=0,1,…4M-1まで復元した上で、4M点のＩＦＦＴ（Inverse Fast Fourier Transform）を行い、窓掛けを考慮し１フレーム前の補正された信号である出力信号y[n]を用いてオーバーラップを戻し、出力信号y[n]を算出する。 The correction processing unit 333 includes the frequency spectrum X [f, w] (w = 0, 1,... 2M−1) of the input signal x [n] and the correction gain G [f, w] (w = 0,1, ... 2M-1) is input. Then, the frequency spectrum X [f, w] of the input signal x [n] is corrected by the equation (12), and the frequency spectrum Y [f, w] (w = 0) of the output signal y [n] as the correction result is corrected. , 1, ... 2M-1) is output.

The time domain transform unit 334 performs time domain transform (frequency inverse transform) on the frequency spectrum Y [f, w] (w = 0, 1,... 2M−1) output from the correction processing unit 333, In consideration of windowing in the frequency domain transform unit 331, a process for returning the overlap is appropriately performed, and an output signal y [n] that is a corrected signal is calculated. For example, for the frequency spectrum Y [f, w] (w = 0,1, ... 2M-1), the frequency spectrum Y [f, w is taken into account that the input signal x [n] is a real signal. ] Is restored to w = 0,1, ... 4M-1 and then 4M point IFFT (Inverse Fast Fourier Transform) is performed, and the output signal y [ n] is used to return the overlap, and the output signal y [n] is calculated.

以上のように、再生される入力信号と集音信号で、信号成分が存在する周波数帯域が異なっていたり、サンプリング周波数が異なっていたりしていても、集音信号の音量について入力信号の周波数帯域を加味して拡張し推定することで、集音信号の音量が高精度に求まり、入力信号の明瞭度を向上させることができる。 As described above, the frequency band of the input signal with respect to the volume of the collected signal, even if the input signal to be reproduced and the collected signal have different frequency bands in which signal components exist or the sampling frequencies are different. Is expanded and estimated in consideration of the above, the volume of the collected sound signal can be obtained with high accuracy, and the clarity of the input signal can be improved.

なお、上述の説明では、本発明を通信装置に適用した場合について説明したが、図７（ａ）に示すように、ディジタルオーディオプレイヤに本発明を適用することも可能である。このディジタルオーディオプレイヤは、フラッシュメモリやHDD(Hard Disk Drive)を用いた記憶部１１を備え、この記憶部１１から読み出した音楽・オーディオデータをデコーダ２が復号する。このとき復号して再生したい所望の信号である目的信号は音楽・オーディオ信号である。このディジタルオーディオプレイヤのマイク６で集音される集音信号z(t)は、近端話者の音声や周囲環境に起因するノイズ成分、出力信号y(t)と音響空間に起因するエコー成分などで構成され、音楽・オーディオ信号は含まれない。この場合は、通信装置と異なって近端話者の音声は不要であるため、近端話者の音声を含んだこれら全ての成分を周囲雑音として取り扱う。 In the above description, the case where the present invention is applied to a communication apparatus has been described. However, the present invention can also be applied to a digital audio player as shown in FIG. The digital audio player includes a storage unit 11 using a flash memory or an HDD (Hard Disk Drive), and the decoder 2 decodes the music / audio data read from the storage unit 11. At this time, the target signal which is a desired signal to be decoded and reproduced is a music / audio signal. The collected sound signal z (t) collected by the microphone 6 of the digital audio player includes a noise component caused by the near-end speaker's voice and the surrounding environment, and an echo component caused by the output signal y (t) and the acoustic space. It does not include music / audio signals. In this case, unlike the communication device, since the near-end speaker's voice is unnecessary, all these components including the near-end speaker's voice are handled as ambient noise.

また、図７（ｂ）に示すように、本発明を通信装置に適用して、音声帯域拡張通話装置に適用することも可能である。この音声帯域拡張通話装置は、デコーダ２Ａを備え、デコーダ２Ａと信号処理部３の間に信号帯域拡張処理部１２を備えた構成である。そして、この場合の信号処理部３は、帯域が拡張された入力信号x’[n]に対して前述の処理をする。 Further, as shown in FIG. 7B, the present invention can be applied to a communication device and applied to a voice band extended call device. This voice band extended communication apparatus includes a decoder 2A and a signal band expansion processing unit 12 between the decoder 2A and the signal processing unit 3. In this case, the signal processing unit 3 performs the above-described processing on the input signal x ′ [n] whose band is extended.

なお、信号帯域拡張処理部１２で行われる処理は、fs_nb_low[Hz]からfs_nb_high[Hz]までに帯域制限された狭帯域の入力信号をfs_wb_low[Hz]からfs_wb_high[Hz]までの広帯域の信号に帯域を拡張する処理であって、例えば、特登３１８９６１４号公報や特登３２４３１７４号公報や特開平９−５５７７８号公報などに記載される既存の技術で実施してよい。 Note that the processing performed by the signal band expansion processing unit 12 is performed by converting a narrowband input signal band-limited from fs_nb_low [Hz] to fs_nb_high [Hz] to a wideband signal from fs_wb_low [Hz] to fs_wb_high [Hz]. This is a process for expanding the band, and may be performed by, for example, an existing technique described in Japanese Patent No. 3189614, Japanese Patent No. 3243174, Japanese Patent Laid-Open No. 9-55778, or the like.

（信号処理部の変形例１）
次に、信号処理部にて用いる狭帯域信号情報は周囲雑音のパワースペクトル、広帯域信号情報は周囲雑音を広帯域の信号に拡張した場合のマスキング閾値（広帯域マスキング閾値）である場合を例にして説明する。 (Modification 1 of the signal processing unit)
Next, the narrowband signal information used in the signal processing unit is described as an example of the power spectrum of ambient noise, and the broadband signal information is a masking threshold (wideband masking threshold) when the ambient noise is expanded to a broadband signal. To do.

図８は、その構成を示すものである。信号処理部３０は、信号処理部３で用いていた周囲雑音情報帯域拡張部３２および信号特性補正部３３に代わって、周囲雑音情報帯域拡張部３４と、信号特性補正部３５とを備えて構成される。 FIG. 8 shows the configuration. The signal processing unit 30 includes an ambient noise information band extending unit 34 and a signal characteristic correcting unit 35 in place of the ambient noise information band extending unit 32 and the signal characteristic correcting unit 33 used in the signal processing unit 3. Is done.

図９に周囲雑音情報帯域拡張部３４の構成例を示す。周囲雑音情報帯域拡張部３４は、パワー正規化部３２１と、辞書格納部３４２と、広帯域パワースペクトル算出部３４３と、広帯域マスキング閾値算出部３４４と、パワー制御部３４５とを備える。 FIG. 9 shows a configuration example of the ambient noise information band extending unit 34. The ambient noise information band extension unit 34 includes a power normalization unit 321, a dictionary storage unit 342, a broadband power spectrum calculation unit 343, a broadband masking threshold calculation unit 344, and a power control unit 345.

周囲雑音情報帯域拡張部３４は、周囲雑音情報帯域拡張部３２と同様に、周囲雑音のパワースペクトルを入力として、入力信号x[n]には存在して集音信号z[n]には存在しない周波数帯域成分を含めた情報（広帯域信号情報）を生成する。つまり、周囲雑音情報帯域拡張部３４では、狭帯域信号情報から狭帯域特徴量データを算出し、狭帯域信号情報から算出される狭帯域特徴量データと広帯域特徴量データとの対応を事前にモデル化しておき、このモデルと取得した狭帯域特徴量データとの対応を用いて広帯域特徴量データを算出し、広帯域特徴量データから広帯域信号情報を生成する。ただし、周囲雑音情報帯域拡張部３４では、狭帯域特徴量データと広帯域特徴量データとの対応のモデル化に、ベクトル量子化によるコードブックを利用する手法を用いる。ここでは、周囲雑音の正規化したパワースペクトル|Nn[f,w]|² (w=0,1,…M-1)をDnb次の狭帯域特徴量データとして用い、周囲雑音の正規化した広帯域のパワースペクトル|Nw[f,w]|² (w=0,1,…2M-1)をDwb次の広帯域特徴量データとして用いる（Dnb=M、Dwb=2M）。具体的には、周囲雑音情報帯域拡張部３４は、周囲雑音のパワースペクトル|N[f,w]|² (w=0,1,…M-1)を入力として、周囲雑音のパワースペクトル|N[f,w]|²について入力信号x[n]には存在して集音信号z[n]には存在しない周波数帯域成分のパワースペクトルを周波数帯域拡張によって生成して、その帯域拡張されたパワースペクトルに対してマスキング閾値を求め、その結果である広帯域マスキング閾値N_wb_th[f,w] (w=0,1,…2M-1)を出力する。 Similar to the ambient noise information band extension unit 32, the ambient noise information band extension unit 34 receives the ambient noise power spectrum and exists in the input signal x [n] and is present in the collected sound signal z [n]. Information including the frequency band component not to be generated (broadband signal information) is generated. That is, the ambient noise information band extending unit 34 calculates narrowband feature data from the narrowband signal information, and models the correspondence between the narrowband feature data calculated from the narrowband signal information and the broadband feature data in advance. Then, wideband feature data is calculated using the correspondence between this model and the acquired narrowband feature data, and wideband signal information is generated from the wideband feature data. However, the ambient noise information band extending unit 34 uses a method using a code book based on vector quantization for modeling the correspondence between narrowband feature data and wideband feature data. Here, the normalized power spectrum of ambient noise | Nn [f, w] | ² (w = 0,1, ... M-1) is used as the Dnb-order narrowband feature data, and the ambient noise is normalized. The broadband power spectrum | Nw [f, w] | ² (w = 0, 1,... 2M−1) is used as the Dwb-order broadband feature data (Dnb = M, Dwb = 2M). Specifically, the ambient noise information band extending unit 34 receives the ambient noise power spectrum | N [f, w] | ² (w = 0, 1,... M−1) as an input, and the ambient noise power spectrum | For N [f, w] | ² , the power spectrum of the frequency band component that exists in the input signal x [n] but does not exist in the collected sound signal z [n] is generated by frequency band expansion, and the band is expanded. Then, a masking threshold value is obtained for the power spectrum, and a wideband masking threshold value N_wb_th [f, w] (w = 0, 1,... 2M−1) as a result is output.

辞書格納部３４２は、Dnb次の狭帯域特徴量データとDwb次の広帯域特徴量データとの対応をモデル化して事前に学習されたサイズＱ（ここではQ=64）のコードブックの辞書λ2_q={μx_q,μy_q}（q=1,…,Q）を格納している。なお、μx_qはq番目のコードブックにおける狭帯域特徴量データのセントロイドベクトル、μy_qはq番目のコードブックにおける広帯域特徴量データのセントロイドベクトルを表している。なお、コードブックのコードベクトルの次数は、狭帯域特徴量データのセントロイドベクトルμx_qと広帯域特徴量データのセントロイドベクトルμy_qの成分の和であるDnb＋Dwbである。 The dictionary storage unit 342 models a correspondence between the Dnb-order narrowband feature data and the Dwb-order broadband feature data and pre-learned codebook dictionary λ2 _{q of} size Q (here, Q = 64). = {μx _q , μy _q } (q = 1, ..., Q) is stored. Incidentally, Myux _q centroid vector of the narrow-band feature amount data in the q-th codebook, Myuwai _q denotes a centroid vector of the wideband feature quantity data in the q-th codebook. Incidentally, the order of the code vectors of the codebook is the sum of the components of the centroid vector Myuwai _q centroid vector Myux _q wideband feature quantity data of the narrowband feature quantity data Dnb + Dwb.

辞書格納部３４２における事前の辞書λ2_qの学習生成方法について、フローチャートを図１０に示し、説明する。以下の説明では、前述の辞書λ1_qの学習生成方法と同じ処理については同じ番号を付番し、説明を簡明にするために必要に応じて重複する説明を省略する。 A method of learning and generating the prior dictionary λ2 _q in the dictionary storage unit 342 will be described with reference to a flowchart shown in FIG. In the following description, the same processes as those in the learning generation method of the dictionary λ1 _q are given the same numbers, and redundant descriptions are omitted as necessary for the sake of simplicity.

コードブックの辞書生成に用いる信号は、入力信号x[n]と同様でサンプリング周波数fs’[Hz]でfs_wb_low[Hz]からfs_wb_high[Hz]までに帯域制限された広帯域な信号を別途事前に集音した信号群である。この信号群は、多数の様々な環境、様々な音量であることが望ましい。以下では、コードブックの辞書生成に用いる広帯域信号の信号群をまとめて広帯域信号データwb[n]と表記する。また、ｎは時刻（サンプル）を表す。 The signal used to generate the codebook dictionary is the same as the input signal x [n], and a wideband signal that is band-limited from fs_wb_low [Hz] to fs_wb_high [Hz] at the sampling frequency fs' [Hz] is separately collected beforehand. It is a group of signals that sounded. This group of signals is preferably in many different environments and different volumes. Hereinafter, a group of wideband signals used for codebook dictionary generation is collectively referred to as wideband signal data wb [n]. N represents time (sample).

まず、広帯域信号データwb[n]を入力として、サンプリング周波数fs[Hz]にダウンサンプリングし狭帯域信号データnb[n]を得る（ステップＳ１０１）。そして、狭帯域信号データnb[n]から狭帯域信号情報を表す特徴量データである狭帯域特徴量データPnb[f,d]（d=1,…,Dnb）を抽出する（ステップＳ２０２）。このステップＳ２０２では、狭帯域信号データnb[n]のパワースペクトル（M次）を得て（ステップＳ１０２１）、狭帯域信号データnb[n]のパワー値を得て（ステップＳ１０２２）、これらのパワースペクトルとパワー値から狭帯域信号データnb[n]の正規化されたパワースペクトルを得て（ステップＳ１０２３）、これを次数Dnb（=M）の狭帯域特徴量データPnb[f,d]（d=1,…,Dnb）とすること
によって狭帯域特徴量データの抽出を行う。 First, the wideband signal data wb [n] is input and downsampling to the sampling frequency fs [Hz] to obtain narrowband signal data nb [n] (step S101). Then, narrowband feature data Pnb [f, d] (d = 1,..., Dnb), which is feature data representing the narrowband signal information, is extracted from the narrowband signal data nb [n] (step S202). In step S202, the power spectrum (Mth order) of the narrowband signal data nb [n] is obtained (step S1021), and the power value of the narrowband signal data nb [n] is obtained (step S1022). A normalized power spectrum of the narrowband signal data nb [n] is obtained from the spectrum and the power value (step S1023), and this is converted into the narrowband feature data Pnb [f, d] (d of the order Dnb (= M) = 1,..., Dnb), the narrowband feature data is extracted.

一方、広帯域信号データwb[n]から広帯域信号情報を表す特徴量データである広帯域特徴量データPwb[f,d]（d=1,…,Dwb）を抽出する（ステップＳ２０３）。このステップＳ２０３では、広帯域信号データwb[n]のパワースペクトルを得て（ステップＳ１０３１）、広帯域信号データwb[n]から広帯域信号データwb[n]のパワー値をフレーム単位で得て（ステップＳ２０３２）、これらのパワースペクトルとパワー値から広帯域信号データwb[n]の正規化されたパワースペクトルをフレーム単位で得て（ステップＳ２０３３）、これを次数Dwb（=2M）の広帯域特徴量データPwb[f,d]（d=1,…,Dwb）とすることによって広帯域特徴量データの抽出を行う。 On the other hand, broadband feature data Pwb [f, d] (d = 1,..., Dwb), which is feature data representing broadband signal information, is extracted from the broadband signal data wb [n] (step S203). In step S203, the power spectrum of the wideband signal data wb [n] is obtained (step S1031), and the power value of the wideband signal data wb [n] is obtained in units of frames from the wideband signal data wb [n] (step S2032). ), A normalized power spectrum of the broadband signal data wb [n] is obtained from the power spectrum and the power value in units of frames (step S2033), and is obtained from the broadband feature data Pwb [of the order Dwb (= 2M) Wideband feature data is extracted by setting f, d] (d = 1,..., Dwb).

次に、狭帯域特徴量データPnb[f,d]（d=1,…,Dnb）と広帯域特徴量データPwb[f,d]（d=1,…,Dwb）を連結して、次数Dnb+Dwbの連結特徴量データP[f,d]（d=1,…,Dnb+Dwb）を生成する（ステップＳ１０４）。 Next, the narrowband feature quantity data Pnb [f, d] (d = 1,..., Dnb) and the broadband feature quantity data Pwb [f, d] (d = 1,. + Dwb linked feature data P [f, d] (d = 1,..., Dnb + Dwb) is generated (step S104).

上記連結特徴量データＰ[f,d]からサイズＱ（ここではQ=64）のコードブックの辞書λ2_q={μx_q,μy_q}（q=1,…,Q）をｋ−ｍｅａｎｓアルゴリズムやＬＢＧアルゴリズムなどによるクラスタリング手法を用いて生成する（ステップＳ２０５）。ステップＳ２０５では、まず狭帯域セントロイドベクトルμx₁を狭帯域特徴量データの全部の平均とし、広帯域セントロイドベクトルμy₁を広帯域特徴量データの全部の平均としてサイズQ=1の初期コードブックを生成する（ステップＳ２０５１）。そして、コードブックのサイズQが所定数（ここでは64）に達したかどうかを判定する（ステップＳ２０５２）。コードブックのサイズQが所定数に達していなければ、コードブックλ2_qの各コードベクトルにおける狭帯域セントロイドベクトルμx_qと広帯域セントロイドベクトルμy_qをわずかにずらして別のコードベクトルを生成することでコードブックのサイズQを２倍に増やす処理を行う（ステップＳ２０５３）。そして、次数Dnb+Dwbの連結特徴量データP[f,d]について、コードブックλ2_qの各コードベクトルにおける狭帯域セントロイドベクトルμx_qとの所定の距離尺度（例えばユークリッド距離やマハラノビス距離）が最小となるコードベクトルを求めて、連結特徴量データP[f,d]をその該当するコードベクトルに割り当てる。その後、コードブックλ2_qのコードベクトルごとに割り当てられた連結特徴量データP[f,d]を用いて、コードベクトルごとに新しい狭帯域セントロイドベクトルμx_qと広帯域セントロイドベクトルμy_qを求めて、コードブックλ2_qを更新する（ステップＳ２０５４）。コードブックのサイズQが所定数に達していれば、そのコードブックλ2_q={μx_q,μy_q}（q=1,…,Q）を出力する。 A codebook dictionary λ2 _q = {μx _q , μy _q } (q = 1,..., Q) of a size Q (here, Q = 64) is connected to the k-means algorithm from the connected feature data P [f, d]. Or using a clustering technique such as LBG algorithm (step S205). In step S205, first, a narrow band centroid vector Myux ₁ and all of the average narrowband feature data, generates an initial codebook of size Q = 1 wideband centroid vector Myuwai ₁ as an average of all the broadband feature data (Step S2051). Then, it is determined whether or not the code book size Q has reached a predetermined number (64 in this case) (step S2052). If the size Q of the codebook has not reached the predetermined number, generating another code vector slightly shifting the narrowband centroid vector Myux _q wideband centroid vector Myuwai _q in each code vector of the codebook .lambda.2 _q In step S2053, the code book size Q is doubled. For the connected feature value data P [f, d] of the order Dnb + Dwb, a predetermined distance scale (for example, Euclidean distance or Mahalanobis distance) from the narrowband centroid vector μx _q in each code vector of the code book λ2 _q is A minimum code vector is obtained, and the connected feature data P [f, d] is assigned to the corresponding code vector. Then, using the connected feature data P [f, d] assigned to each code vector of the code book λ2 _q , a new narrowband centroid vector μx _q and wideband centroid vector μy _q are obtained for each code vector. The code book λ2 _q is updated (step S2054). If the code book size Q reaches a predetermined number, the code book λ2 _q = {μx _q , μy _q } (q = 1,..., Q) is output.

広帯域パワースペクトル算出部３４３は、パワー正規化部３２１から出力された周囲雑音の正規化したパワースペクトル|Nn[f,w]|² (w=0,1,…M-1)をDnb次の特徴量データとして入力し、辞書格納部３４２からコードブックの辞書λ2_q={μx_q,μy_q}（q=1,…,Q）を読み出して、Dnb次の狭帯域特徴量データとDwb次の広帯域特徴量データとの対応から広帯域パワースペクトル|Nw[f,w]|² (w=0,1,…2M-1)を求める。具体的には、Q個ある狭帯域セントロイドベクトルμx_q（q=1,…,Q）から、周囲雑音の正規化したパワースペクトル|Nn[f,w]|² (w=0,1,…M-1)と所定の距離尺度で一番距離が近いものを求めて、一番距離が近いコードベクトルにおける広帯域セントロイドベクトルμy_qを広帯域パワースペクトル|Nw[f,w]|² (w=0,1,…2M-1)とする。 The broadband power spectrum calculation unit 343 calculates the normalized power spectrum | Nn [f, w] | ² (w = 0, 1,... M−1) of the ambient noise output from the power normalization unit 321 as Dnb-th order. Input as feature data, read out the codebook dictionary λ2 _q = {μx _q , μy _q } (q = 1,..., Q) from the dictionary storage unit 342, Dnb-order narrowband feature data and Dwb-order The broadband power spectrum | Nw [f, w] | ² (w = 0, 1,... 2M−1) is obtained from the correspondence with the broadband feature quantity data. Specifically, from Q narrowband centroid vectors μx _q (q = 1, ..., Q), the normalized power spectrum of ambient noise | Nn [f, w] | ² (w = 0,1, ... M-1) and the closest distance on a given distance scale, and the wideband centroid vector μy _q in the code vector with the shortest distance is represented by the wideband power spectrum | Nw [f, w] | ² (w = 0,1, ... 2M-1).

広帯域マスキング閾値算出部３４４は、広帯域パワースペクトル算出部３４３から出力される広帯域パワースペクトル|Nw[f,w]|² (w=0,1,…2M-1)を入力として、周波数成分ごとに周囲雑音のマスキング閾値である広帯域マスキング閾値N_wb_th1[f,w] (w=0,1,…2M-1)を算出する。 The wideband masking threshold value calculation unit 344 receives the wideband power spectrum | Nw [f, w] | ² (w = 0,1,... 2M−1) output from the wideband power spectrum calculation unit 343 as an input, and outputs the frequency masking threshold value calculation unit 344 for each frequency component. A broadband masking threshold N_wb_th1 [f, w] (w = 0, 1,... 2M−1), which is a masking threshold for ambient noise, is calculated.

一般にマスキング閾値は、spreading functionと呼ばれる関数を信号のパワースペクトルに畳み込むことで算出することができる。すなわち、周囲雑音の広帯域マスキング閾値N_wb_th1[f,w] (w=0,1,…2M-1)は、spreading functionを関数sprdngf()として、式（１３）の式で算出される。周囲雑音の広帯域パワースペクトル|Nw[f,w]|²が広帯域マスキング閾値N_wb_th1[f,w]以下であるならば、周波数ビンω以外の周波数帯域の周囲雑音の広帯域パワースペクトルによってマスクされる。図１１に、横軸を周波数[Hz]、縦軸をパワー[dB]として、屋外など様々な環境で採取した周囲雑音の広帯域マスキング閾値の例を示す。

ここで、bark[w]は周波数ビンωをバーク尺度に変換したバーク値を表し、spreading functionでは、バーク尺度bark[w]に適宜変換する。バーク尺度は、聴覚の分解能を考慮して、低域ほど細かく、高域ほど粗く設定された尺度である。 In general, the masking threshold can be calculated by convolving a function called a spreading function with the power spectrum of a signal. That is, the broadband masking threshold N_wb_th1 [f, w] (w = 0, 1,... 2M-1) of ambient noise is calculated by the equation (13) using the spreading function as a function sprdngf (). If the broadband power spectrum | Nw [f, w] | ² of the ambient noise is equal to or less than the broadband masking threshold N_wb_th1 [f, w], it is masked by the broadband power spectrum of the ambient noise in the frequency band other than the frequency bin ω. FIG. 11 shows an example of a broadband masking threshold of ambient noise collected in various environments such as outdoors, where the horizontal axis is frequency [Hz] and the vertical axis is power [dB].

Here, bark [w] represents a Bark value obtained by converting the frequency bin ω to the Bark scale. In the spreading function, bark [w] is appropriately converted to the Bark scale bark [w]. The Bark scale is a scale that is set to be finer in the lower range and coarser in the higher range in consideration of auditory resolution.

ここでは、spreading functionを関数sprdngf()として、ISO/IEC13818-7で定義されている方式を用いるとする。spreading functionは、例えばITU-R1387、3GPP TS 26.403といった文献で説明されている他の方式を用いても良い。なお、バーク尺度でなくても、メル尺度、ＥＲＢ尺度など人間の音の高さの知覚特性や聴覚フィルタから得られた尺度を用いたspreading functionを適宜用いても構わない。 Here, it is assumed that the spreading function is a function sprdngf () and a method defined in ISO / IEC13818-7 is used. For the spreading function, other methods described in the literature such as ITU-R1387 and 3GPP TS 26.403 may be used. Instead of the Bark scale, a spreading function using a human sound pitch perception characteristic such as a Mel scale or an ERB scale or a scale obtained from an auditory filter may be used as appropriate.

パワー制御部３４５は、パワー正規化３２１から出力された狭帯域パワー値Pow_N[f]と広帯域マスキング閾値算出部３４５から出力された広帯域マスキング閾値N_wb_th1[f,w] (w=0,1,…2M-1)を入力として、広帯域マスキング閾値N_wb_th1[f,w]のfs_nb_low[Hz]からfs_nb_high[Hz]におけるパワーを狭帯域パワー値Pow_N[f]と同じになるように、広帯域マスキング閾値N_wb_th1[f,w]を増幅あるいは減衰させることで制御し、このパワー制御されたN_wb_th1[f,w]を広帯域マスキング閾値N_wb_th[f,w]として出力する。 The power control unit 345 outputs the narrowband power value Pow_N [f] output from the power normalization 321 and the wideband masking threshold N_wb_th1 [f, w] (w = 0, 1,...) Output from the wideband masking threshold calculation unit 345. 2M-1) as an input, the wideband masking threshold N_wb_th1 [f, w] so that the power from fs_nb_low [Hz] to fs_nb_high [Hz] is the same as the narrowband power value Pow_N [f]. f, w] is controlled by amplifying or attenuating, and this power-controlled N_wb_th1 [f, w] is output as a wideband masking threshold N_wb_th [f, w].

このようにして、周囲雑音情報帯域拡張部３４では、狭帯域信号情報である周囲雑音のパワースペクトル|N[f,w]|²から、広帯域信号情報である広帯域マスキング閾値N_wb_th[f,w]を求める。 In this manner, the ambient noise information band extension unit 34 uses the broadband noise masking threshold N_wb_th [f, w], which is broadband signal information, from the power spectrum | N [f, w] | ^{2 of} ambient noise, which is narrowband signal information. Ask for.

図１２に信号特性補正部３５の構成例を示す。信号特性補正部３５は、周波数領域変換部３３１と、パワー算出部３５２と、マスキング閾値算出部３５３と、マスキング判定部３５４と、パワー平滑化部３５５と、補正度合決定部３５６と、補正処理部３３３と、時間領域変換部３３４とを備える。 FIG. 12 shows a configuration example of the signal characteristic correction unit 35. The signal characteristic correction unit 35 includes a frequency domain conversion unit 331, a power calculation unit 352, a masking threshold value calculation unit 353, a masking determination unit 354, a power smoothing unit 355, a correction degree determination unit 356, and a correction processing unit. 333 and a time domain conversion unit 334.

信号特性補正部３５は、入力信号x[n] (n=0,1,…2N-1)と広帯域マスキング閾値N_wb_th[f,w]を入力とし、入力信号x[n]が集音信号に含まれる周囲雑音に埋もれてしまわないよう明瞭化する信号補正処理を行い、その補正後の出力信号y[n] (n=0,1,…2N-1)を出力する。 The signal characteristic correction unit 35 receives the input signal x [n] (n = 0, 1,... 2N-1) and the wideband masking threshold N_wb_th [f, w] as input, and the input signal x [n] is used as a sound collection signal. A signal correction process for clarifying the signal so as not to be buried in the included ambient noise is performed, and an output signal y [n] (n = 0, 1,... 2N-1) after the correction is output.

パワー算出部３５２は、周波数領域変換部３３１から出力された入力信号x[n]の周波数スペクトルX[f,w] (w=0,1,…2M-1)における実部と虚部の２乗和であるパワースペクトル|X[f,w]|² (w=0,1,…2M-1)を算出して出力する。 The power calculation unit 352 has two real and imaginary parts in the frequency spectrum X [f, w] (w = 0, 1,... 2M-1) of the input signal x [n] output from the frequency domain conversion unit 331. A power spectrum | X [f, w] | ² (w = 0, 1,... 2M−1) which is a sum of products is calculated and output.

マスキング閾値算出部３５３は、パワー算出部３５２から出力された入力信号x[n]のパワースペクトル|X[f,w]|² (w=0,1,…2M-1)を入力として、spreading functionを関数sprdngf()として、式（１４）の式で入力信号x[n]の広帯域マスキング閾値X_th[f,w] (w=0,1,…2M-1)を算出し出力する。広帯域マスキング閾値X_th[f,w]は、入力信号x[n]のパワースペクトル|X[f,w]|² が入力信号x[n]の広帯域マスキング閾値X_th[f,w]以下であるなら
ば、周波数ビンω以外の周波数帯域の入力信号x[n]のパワースペクトル|X[f,w]|² によってマスクされることを表す。

マスキング判定部３５４は、パワー算出部３５２から出力されたパワースペクトル|X[f,w]|² (w=0,1,…2M-1)とマスキング閾値算出部３５３から出力された広帯域マスキング閾値X_th[f,w]とを入力とし、周波数帯域ごとに入力信号x[n]自身によってマスクされるか否かを表すマスキング判定情報X_flag[f,w] (w=0,1,…2M-1)を出力する。具体的には、パワースペクトル|X[f,w]|² と広帯域マスキング閾値X_th[f,w]の大小比較を行い、パワースペクトル|X[f,w]|² が広帯域マスキング閾値X_th[f,w]以上ならば、その周波数成分は入力信号x[n]中の他の周波数成分にマスクされないとしてX_flag[f,w]=0とする。また、パワースペクトル|X[f,w]|² が広帯域マスキング閾値X_th[f,w]未満ならば、その周波数成分は入力信号x[n]中の他の周波数成分にマスクされるとしてX_flag[f,w]=1とする
パワー平滑化部３５５は、パワー算出部３５２から出力されたパワースペクトル|X[f,w]|² (w=0,1,…2M-1)とマスキング判定部３５４から出力されたマスキング判定情報X_flag[f,w]とを入力として、パワースペクトル|X[f,w]|² を式（１５）の式による三角窓による移動平均によって平滑化して、平滑化されたパワースペクトル|X_S[f,w]|² を出力する。なお、Kは平滑化を計算する範囲であり、α_X[j]は、ｊが０に近いほど係数が大きくなるようなスムージング係数である。例えば、K=３で、α_X[j]は［０．１、０．２、０．４、０．８、０．４、０．２、０．１］とする。

補正度合決定部３５６は、パワー平滑化部３５５から出力された平滑化されたパワースペクトル|X_S[f,w]|² (w=0,1,…2M-1)とマスキング判定部３５４から出力されたマスキング判定情報X_flag[f,w] (w=0,1,…2M-1)と周囲雑音情報帯域拡張部３２から出力されたN_wb_th[f,w] (w=0,1,…2M-1)とを入力として、補正ゲインG[f,w] (w=0,1,…2M-1)を算出して出力する。補正ゲインG[f,w]の具体的な算出は、まずマスキング判定情報X_flag[f,w]により入力信号x[n]中の他の周波数成分にマスクされる（X_flag[f,w]=1）と判定された周波数帯域であれば、G[f,w]=1とし補正による増幅も減衰も行わないようにする。そして、マスキング判定情報X_flag[f,w]により入力信号x[n]中の他の周波数成分にマスクされない（X_flag[f,w]=0）と判定された周波数帯域について、パワースペクトル|X[f,w]|² と周囲雑音の広帯域マスキング閾値N_wb_th[f,w]との大小比較を行う。ここで、パワースペクトル|X[f,w]|² が周囲雑音の広帯域マスキング閾値N_wb_th[f,w]以上ならば、その周波数成分は集音信号z[n]中の他の周波数成分にマスクされないのでG[f,w]=1とし、補正による増幅を行わないようにする。一方で、パワースペクトル|X[f,w]|² が周囲雑音の広帯域マスキング閾値N_wb_th[f,w]未満ならば、集音信号z[n]中の周囲雑音が少なければ知覚できるにも関わらず、周囲雑音があるためにマスクされていると判断し、式（１６）の式のように補正ゲインG[f,w]を周囲雑音の広帯域マスキング閾値N_wb_th[f,w]と平滑化されたパワースペクトル|X_S[f,w]|²との比に基づいて算出する。なお、関数Ｆは、平滑されたパワースペクトル|X_S[f,w]|²のスペクトル傾斜を周囲雑音の広帯域マスキング閾値N_wb_th[f,w]の形状と平行に近くなるように増幅するような関数である。ここで、α、βは正の定数であり、γは正負いずれかの定数である。これらの定数は、入力信号x[n]の増幅度合いを調整するために用いられる。

補正度合決定部３５６において、このように求めた補正ゲインG[f,w]をさらに式（２２）の式による三角窓による移動平均によって平滑化して、平滑化された補正ゲインG_S[f,w]を用いてもよい。なお、Kは平滑化を計算する範囲であり、α_X[j]は、ｊが０に近いほど係数が大きくなるようなスムージング係数である。例えば、K=３で、α_G[j]は［０．１、０．２、０．４、０．８、０．４、０．２、０．１］とする。

以上のように、再生される入力信号と集音信号で、信号成分が存在する周波数帯域が異なっていたり、サンプリング周波数が異なっていたりしていても、集音信号の周波数特性であるパワースペクトルについて入力信号の周波数帯域を加味して帯域拡張して推定することで、集音信号の周波数特性が高精度に求まり、入力信号の明瞭度を向上させることができる。 The masking threshold value calculation unit 353 receives the power spectrum | X [f, w] | ² (w = 0, 1,... 2M−1) of the input signal x [n] output from the power calculation unit 352 as an input and is spreading. The function is a function sprdngf (), and the broadband masking threshold value X_th [f, w] (w = 0, 1,... 2M−1) of the input signal x [n] is calculated and output by the expression (14). If the power spectrum | X [f, w] | ² of the input signal x [n] is less than or equal to the broadband masking threshold X_th [f, w] of the input signal x [n], the wideband masking threshold X_th [f, w] For example, it represents that the input signal x [n] in the frequency band other than the frequency bin ω is masked by the power spectrum | X [f, w] | ² .

The masking determination unit 354 includes the power spectrum | X [f, w] | ² (w = 0, 1,... 2M−1) output from the power calculation unit 352 and the wideband masking threshold output from the masking threshold calculation unit 353. X_th [f, w] as an input, and masking determination information X_flag [f, w] (w = 0,1,... 2M-) indicating whether or not the input signal x [n] itself is masked for each frequency band 1) is output. Specifically, the power spectrum | X [f, w] | ² and the broadband masking threshold X_th [f, w] are compared, and the power spectrum | X [f, w] | ² is compared with the broadband masking threshold X_th [f , w] or more, X_flag [f, w] = 0 is set assuming that the frequency component is not masked by other frequency components in the input signal x [n]. If the power spectrum | X [f, w] | ² is less than the wideband masking threshold value X_th [f, w], the frequency component is masked by other frequency components in the input signal x [n]. f, w] = 1 The power smoothing unit 355 includes a power spectrum | X [f, w] | ² (w = 0,1,... 2M−1) output from the power calculation unit 352 and a masking determination unit. Smoothing is performed by smoothing the power spectrum | X [f, w] | ² by a moving average using a triangular window according to the equation (15), using the masking determination information X_flag [f, w] output from the 354 as an input. Output the power spectrum | X _S [f, w] | ² . Note that K is a range in which smoothing is calculated, and α _X [j] is a smoothing coefficient such that the coefficient becomes larger as j is closer to 0. For example, when K = 3, α _X [j] is [0.1, 0.2, 0.4, 0.8, 0.4, 0.2, 0.1].

The correction degree determination unit 356 receives the smoothed power spectrum | X _S [f, w] | ² (w = 0, 1,... 2M−1) output from the power smoothing unit 355 and the masking determination unit 354. Output masking determination information X_flag [f, w] (w = 0,1,... 2M-1) and N_wb_th [f, w] (w = 0,1,...) Output from the ambient noise information band extension unit 32. 2M-1) as an input, a correction gain G [f, w] (w = 0, 1,... 2M-1) is calculated and output. The specific calculation of the correction gain G [f, w] is first masked to other frequency components in the input signal x [n] by the masking determination information X_flag [f, w] (X_flag [f, w] = If the frequency band is determined as 1), G [f, w] = 1 is set so that neither amplification nor attenuation by correction is performed. Then, for the frequency band determined not to be masked by other frequency components in the input signal x [n] by the masking determination information X_flag [f, w] (X_flag [f, w] = 0), the power spectrum | X [ Comparison is made between f, w] | ² and the ambient noise broadband masking threshold N_wb_th [f, w]. Here, if the power spectrum | X [f, w] | ² is equal to or larger than the wideband masking threshold N_wb_th [f, w] of ambient noise, the frequency component is masked to other frequency components in the sound collection signal z [n]. Since G [f, w] = 1, the amplification by correction is not performed. On the other hand, if the power spectrum | X [f, w] | ² is less than the ambient noise broadband masking threshold N_wb_th [f, w], it can be perceived if the ambient noise in the collected signal z [n] is small. First, it is determined that the masking is performed due to the presence of the ambient noise, and the correction gain G [f, w] is smoothed with the wideband masking threshold N_wb_th [f, w] of the ambient noise as in the formula (16). _{X S [f, w] |} | power spectrum is calculated based on the ratio of the ^two. It should be noted that the function F amplifies the smoothed power spectrum | X _S [f, w] | ^{2 so that} the spectral slope becomes close to parallel to the shape of the broadband masking threshold N_wb_th [f, w] of ambient noise. It is a function. Here, α and β are positive constants, and γ is either a positive or negative constant. These constants are used to adjust the amplification degree of the input signal x [n].

In the correction degree determination unit 356, the correction gain G [f, w] thus obtained is further smoothed by a moving average using a triangular window according to the equation (22), and the smoothed correction gain G _S [f, w w] may be used. Note that K is a range in which smoothing is calculated, and α _X [j] is a smoothing coefficient such that the coefficient becomes larger as j is closer to 0. For example, when K = 3, α _G [j] is [0.1, 0.2, 0.4, 0.8, 0.4, 0.2, 0.1].

As described above, the power spectrum, which is the frequency characteristics of the collected sound signal, even if the input signal to be reproduced and the collected sound signal have different frequency bands in which signal components exist or the sampling frequency is different. By performing band estimation taking into account the frequency band of the input signal, the frequency characteristics of the collected sound signal can be obtained with high accuracy, and the clarity of the input signal can be improved.

なお、図７（ｂ）に示す音声帯域拡張通話装置に本変形例を適用する場合は、信号帯域拡張処理部１２において事前に設定した周波数f_limit（f_limitは500〜1200[Hz]程度で、例えばf_limit=1000[Hz]とする）以下の低い周波数帯域が拡張されるとき、つまりfs_wb_low < fs_nb_lowかつfs_wb_low < f_limitであるときは、信号特性補正部３５でf_limit以下の周波数帯域について信号補正処理をしないようにする。低域（f_limit以下の周波数）においては、集音する環境やノイズ成分の種類によって、周囲雑音のバラツキが大きいため、このようにすることで、信号帯域拡張処理部１２において拡張した低い周波数帯域での周囲雑音のバラツキによって信号補正処理が不安定になることを防止できる。 When the present modification is applied to the voice band extension communication device shown in FIG. 7B, the frequency f_limit (f_limit is about 500 to 1200 [Hz]) set in advance in the signal band extension processing unit 12, for example, f_limit = 1000 [Hz]) When the following low frequency band is expanded, that is, when fs_wb_low <fs_nb_low and fs_wb_low <f_limit, the signal characteristic correction unit 35 does not perform signal correction processing for the frequency band below f_limit Like that. In the low frequency range (frequency below f_limit), there is a large variation in ambient noise depending on the environment in which sound is collected and the type of noise component. By doing so, in the low frequency band expanded by the signal band expansion processing unit 12. It is possible to prevent the signal correction process from becoming unstable due to variations in ambient noise.

（信号処理部の変形例２）
本変形例では、図８に示す信号処理部３０にて用いる狭帯域信号情報を周囲雑音のパワースペクトルとし、広帯域信号情報を周囲雑音の広帯域パワースペクトル（周囲雑音を広帯域の信号に拡張した場合のパワースペクトル）とした場合を例にして説明する。この場合、周囲雑音情報帯域拡張部３４では、狭帯域信号情報である周囲雑音のパワースペクトルを入力として、狭帯域特徴量データとして周囲雑音の正規化されたパワースペクトルを算出し、広帯域特徴量データである周囲雑音の正規化された広帯域パワースペクトルを事前にモデル化された狭帯域特徴量データと広帯域特徴量データとの対応を用いて算出し、この広帯域特徴量データから広帯域信号情報である周囲雑音の広帯域パワースペクトルを生成するようにする。なお、狭帯域特徴量データと広帯域特徴量データとの対応のモデル化には、図５に示すGMMを利用する手法を用いる。これによれば、再生される入力信号と集音信号で、信号成分が存在する周波数帯域が異なっていたり、サンプリング周波数が異なっていたりしていても、集音信号の周波数特性であるパワースペクトルについて入力信号の周波数帯域を加味して帯域拡張して推定することで、集音信号の周波数特性が高精度に求まり、入力信号の明瞭度を向上させることができる。 (Modification 2 of the signal processing unit)
In this modification, the narrowband signal information used in the signal processing unit 30 shown in FIG. 8 is the ambient noise power spectrum, and the broadband signal information is the ambient noise broadband power spectrum (when the ambient noise is expanded to a broadband signal). The power spectrum will be described as an example. In this case, the ambient noise information band extension unit 34 receives the power spectrum of the ambient noise that is the narrowband signal information as an input, calculates the normalized power spectrum of the ambient noise as the narrowband feature data, and the broadband feature data. The normalized broadband power spectrum of the ambient noise is calculated by using the correspondence between the narrow-band feature data and the broadband feature data modeled in advance, and from this broadband feature data, the ambient signal that is the broadband signal information is calculated. A broadband power spectrum of noise is generated. Note that a technique using the GMM shown in FIG. 5 is used to model the correspondence between the narrowband feature data and the broadband feature data. According to this, even if the input signal to be reproduced and the collected sound signal have different frequency bands in which signal components exist or the sampling frequency is different, the power spectrum that is the frequency characteristic of the collected sound signal By performing band estimation taking into account the frequency band of the input signal, the frequency characteristics of the collected sound signal can be obtained with high accuracy, and the clarity of the input signal can be improved.

（信号処理部の変形例３）
次に、信号処理部にて用いる狭帯域信号情報は周囲雑音のパワースペクトル、広帯域信号情報は周囲雑音を広帯域の信号に拡張した場合のマスキング閾値（広帯域マスキング閾値）である場合を例にして説明する。 (Modification 3 of the signal processing unit)
Next, the narrowband signal information used in the signal processing unit is described as an example of the power spectrum of ambient noise, and the broadband signal information is a masking threshold (wideband masking threshold) when the ambient noise is expanded to a broadband signal. To do.

図１３は、その構成を示すものである。信号処理部３００では、信号処理部３０で用いていた周囲雑音情報帯域拡張部３４に代わって、周囲雑音情報帯域拡張部３６を用いた構成になっている。 FIG. 13 shows the configuration. In the signal processing unit 300, the ambient noise information band extending unit 36 is used instead of the ambient noise information band extending unit 34 used in the signal processing unit 30.

図１４に周囲雑音情報帯域拡張部３６の構成例を示す。周囲雑音情報帯域拡張部３６は、パワー正規化部３２１と、狭帯域マスキング閾値算出部３６２と、帯域制御部３６３と、辞書格納部３６４と、広帯域マスキング閾値算出部３６５と、閾値補正部３６６と、パワー制御部３４５とを備える。 FIG. 14 shows a configuration example of the ambient noise information band extending unit 36. The ambient noise information band extension unit 36 includes a power normalization unit 321, a narrow band masking threshold calculation unit 362, a band control unit 363, a dictionary storage unit 364, a wide band masking threshold calculation unit 365, and a threshold correction unit 366. And a power control unit 345.

周囲雑音情報帯域拡張部３６は、周囲雑音情報帯域拡張部３４と同様に、集音信号z[n]の周波数帯域成分における情報（狭帯域信号情報）を入力として、入力信号x[n]には存在して集音信号z[n]には存在しない周波数帯域成分を含めた情報（広帯域信号情報）を生成する。つまり、周囲雑音情報帯域拡張部３６では、狭帯域信号情報から狭帯域特徴量データを算出し、狭帯域特徴量データと広帯域特徴量データとの対応を事前にモデル化しておき、このモデルと取得した狭帯域特徴量データとの対応を用いて広帯域特徴量データを算出し、広帯域特徴量データから広帯域信号情報を生成する。この際、周囲雑音情報帯域拡張部３６では、狭帯域特徴量データと広帯域特徴量データとの対応のモデル化に、ベクトル量子化によるコードブックを利用する手法を用いる。ここでは、周囲雑音の帯域制御された狭帯域マスキング閾値N_th[f,w] (w=0,1,…M_C-1)のDnb次の狭帯域特徴量データとして用い、周囲雑音の広帯域マスキング閾値N_wb_th1[f,w](w=0,1,…2M-1)のDwb次の広帯域特徴量データとして用いる（Dnb=M_C、Dwb=2M）。具体的には、周囲雑音情報帯域拡張部３６は、周囲雑音のパワースペクトル|N[f,w]|² (w=0,1,…M-1)を入力として、周囲雑音のマスキング閾値を求め、このマスキング閾値を帯域制限し、帯域制限されたマスキング閾値について入力信号x[n]には存在して集音信号z[n]には存在しない周波数帯域成分を周波数帯域拡張して生成して、この帯域拡張されたマスキング閾値である広帯域マスキング閾値N_wb_th[f,w] (w=0,1,…2M-1)を出力する。 Similarly to the ambient noise information band extending unit 34, the ambient noise information band extending unit 36 receives information (narrow band signal information) in the frequency band component of the collected sound signal z [n] as an input and inputs it to the input signal x [n]. Produces information (wideband signal information) including frequency band components that are present and not present in the collected sound signal z [n]. That is, the ambient noise information band extension unit 36 calculates narrowband feature data from the narrowband signal information, models the correspondence between the narrowband feature data and the broadband feature data in advance, and acquires this model and Broadband feature data is calculated using the correspondence with the narrowband feature data, and wideband signal information is generated from the wideband feature data. At this time, the ambient noise information band extending unit 36 uses a method using a code book based on vector quantization for modeling the correspondence between the narrowband feature value data and the wideband feature value data. Here, ambient noise band-controlled narrowband masking threshold N_th [f, w] (w = 0,1, ... M _C −1) is used as Dnb-order narrowband feature data, and ambient noise broadband masking It is used as Dwb-order wideband feature data of the threshold value N_wb_th1 [f, w] (w = 0, 1,... 2M−1) (Dnb = M _C , Dwb = 2M). Specifically, the ambient noise information band extension unit 36 receives the ambient noise power spectrum | N [f, w] | ² (w = 0, 1,... M−1) as an input, and sets the ambient noise masking threshold. The band is limited to this masking threshold, and the band-limited masking threshold is generated by expanding the frequency band component that exists in the input signal x [n] but does not exist in the collected sound signal z [n]. Then, the wideband masking threshold N_wb_th [f, w] (w = 0, 1,... 2M−1), which is the masking threshold expanded in band, is output.

狭帯域マスキング閾値算出部３６２は、パワー正規化部３２１から出力される周囲雑音の正規化したパワースペクトル|Nn[f,w]|²(w=0,1,…M-1)を入力として、周波数成分ごとに周囲雑音のマスキング閾値である狭帯域マスキング閾値N_th1[f,w] (w=0,1,…M-1)を算出する。前述した広帯域マスキング閾値算出部３４４と同様にして、データ長である2MをMで置き換え、周囲雑音の狭帯域マスキング閾値N_th1[f,w] (w=0,1,…M-1)は、spreading functionを関数sprdngf()として、式（１９）の式で算出される。狭帯域マスキング閾値N_th1[f,w]は、周囲雑音の正規化したパワースペクトル|Nn[f,w]|²が狭帯域マスキング閾値N_th1[f,w]以下であるならば、周波数ビンω以外の周波数帯域の周囲雑音の正規化したパワースペクトルによってマスクされることを示す。

帯域制御部３６３は、狭帯域マスキング閾値算出部３６２から出力された周囲雑音の狭帯域マスキング閾値N_th1[f,w] (w=0,1,…M-1)を入力として、帯域制御する下限周波数limit_low[Hz]から帯域制御する上限周波数limit_high[Hz]までの周波数帯域の信号情報のみを用いるように制御し、帯域制御された狭帯域マスキング閾値であるN_th[f,w]を出力する。ただし、fs_nb_low ≦ limit_low < limit_high ≦ fs_nb_high < fs/2とする。例えばlimit_low=1000[Hz]、limit_high=3400[Hz]とするとき、これらの周波数帯域を式（２４）で周波数ビンωに変換して考慮すると、狭帯域マスキング閾値N_th1[f,w] (w=0,1,…M-1)のうちw=32,33,…108のみを用いるようにする。Ｍ_CをN_th[f,w]の配列の個数として、帯域制御された狭帯域マスキング閾値N_th[f,w] (w=0,1,…M_C-1)は、狭帯域マスキング閾値N_th1[f,w] (w=32,…108)そのものを代入する。この場合はM_C=108-32+1=77である。 The narrowband masking threshold value calculation unit 362 receives the normalized power spectrum | Nn [f, w] | ² (w = 0, 1,... M−1) of ambient noise output from the power normalization unit 321 as an input. Then, for each frequency component, a narrowband masking threshold N_th1 [f, w] (w = 0, 1,... M−1), which is a masking threshold for ambient noise, is calculated. Similar to the above-described wideband masking threshold calculation unit 344, 2M which is the data length is replaced with M, and the narrowband masking threshold N_th1 [f, w] (w = 0, 1,... M-1) of ambient noise is The spreading function is a function sprdngf (), and is calculated by the equation (19). Narrowband masking threshold N_th1 [f, w] is not frequency bin ω if the normalized power spectrum of ambient noise | Nn [f, w] | ² is less than or equal to narrowband masking threshold N_th1 [f, w] It is masked by the normalized power spectrum of ambient noise in the frequency band of.

The bandwidth control unit 363 receives the ambient noise narrowband masking threshold N_th1 [f, w] (w = 0, 1,... M−1) output from the narrowband masking threshold calculation unit 362 as an input, and is a lower limit for bandwidth control. Control is performed so that only signal information in the frequency band from the frequency limit_low [Hz] to the upper limit frequency limit_high [Hz] for band control is used, and N_th [f, w] that is a band-controlled narrowband masking threshold is output. However, fs_nb_low ≦ limit_low <limit_high ≦ fs_nb_high <fs / 2. For example, when limit_low = 1000 [Hz] and limit_high = 3400 [Hz], if these frequency bands are converted into frequency bins ω by Equation (24), the narrow band masking threshold N_th1 [f, w] (w = 0,1, ... M-1), only w = 32,33, ... 108 are used. As the number of sequences N_th [f, w] and M _C, bandwidth control narrowband masking threshold N_th [f, w] (w = 0,1, ... M C -1) , the narrow band masking threshold N_th1 [ f, w] (w = 32, ... 108) itself is substituted. In this case, M _C = 108−32 + 1 = 77.

図１１にも示す通り、低域においては、集音する環境やノイズ成分の種類によって、周囲雑音のマスキング閾値の分散・バラツキが大きいことが分かる。周囲雑音の主要な成分はノイズ成分であるため、狭帯域マスキング閾値N_th1[f,w]も低域においては、分散・バラツキが大きくなる。そこで、狭帯域特徴量データと広帯域特徴量データとの対応のモデル化をベクトル量子化によるコードブックを利用する手法を用いて、広帯域マスキング閾値を高精度に求めるために、分散・バラツキが大きい低域を用いないように帯域制御する。つまり、ここで、帯域制御する下限周波数limit_low[Hz]は、狭帯域マスキング閾値の分散・バラツキが所定の値よりも小さいような周波数帯域の下限に設定することが望ましい。こうすることによって、広帯域マスキング閾値を高精度に求めることができ、入力信号の明瞭度を向上させることができる。 As shown in FIG. 11, in the low frequency range, it can be seen that the dispersion / variation of the masking threshold value of the ambient noise is large depending on the environment in which sound is collected and the type of the noise component. Since the main component of ambient noise is a noise component, the narrowband masking threshold N_th1 [f, w] also has a large variance / variation in the low frequency range. Therefore, in order to obtain the wideband masking threshold with high accuracy by using a code book based on vector quantization to model the correspondence between the narrowband feature data and the wideband feature data, the variance and variation are low. Band control is performed so that no band is used. That is, here, the lower limit frequency limit_low [Hz] for band control is desirably set to the lower limit of the frequency band such that the dispersion / variation of the narrow band masking threshold is smaller than a predetermined value. By doing so, the broadband masking threshold can be obtained with high accuracy, and the clarity of the input signal can be improved.

また、マスキング閾値は、その周波数帯域のパワースペクトルだけではなくて周囲の周波数帯域のパワースペクトルを加味して算出される。そのため、マスキング閾値を求める元々の信号の帯域制限されている周波数帯域付近では、正確にマスキング閾値が算出できない。つまり、帯域制御する上限周波数limit_high[Hz]は、帯域制限を加味してもマスキング閾値が正確に求まる周波数帯域の上限に設定することが望ましい。こうすることによって、広帯域マスキング閾値を高精度に求めることができ、入力信号の明瞭度を向上させることができる。 Further, the masking threshold is calculated not only by considering the power spectrum of the frequency band but also the power spectrum of the surrounding frequency band. Therefore, the masking threshold cannot be accurately calculated in the vicinity of the frequency band where the band of the original signal for obtaining the masking threshold is limited. That is, it is desirable to set the upper limit frequency limit_high [Hz] for band control to the upper limit of the frequency band where the masking threshold can be accurately obtained even if the band limitation is taken into account. By doing so, the broadband masking threshold can be obtained with high accuracy, and the clarity of the input signal can be improved.

辞書格納部３６４は、Dnb次の狭帯域特徴量データとDwb次の広帯域特徴量データとの対応をモデル化して事前に学習されたサイズＱ（ここではQ=64）のコードブックの辞書λ3_q={μx_q,μy_q}（q=1,…,Q）を格納している。なお、μx_qはq番目のコードブックにおける狭帯域特徴量データのセントロイドベクトル、μy_qはq番目のコードブックにおける広帯域特徴量データのセントロイドベクトルを表している。なお、コードブックのコードベクトルの次数は、狭帯域信号情報のセントロイドベクトルμx_qと広帯域信号情報のセントロイドベクトルμy_qの成分の和であるDnb＋Dwbである。 The dictionary storage unit 364 models the correspondence between the Dnb-order narrowband feature data and the Dwb-order broadband feature data and pre-learned codebook dictionary λ3 _{q of} size Q (here, Q = 64). = {μx _q , μy _q } (q = 1, ..., Q) is stored. Incidentally, Myux _q centroid vector of the narrow-band feature amount data in the q-th codebook, Myuwai _q denotes a centroid vector of the wideband feature quantity data in the q-th codebook. Incidentally, the order of the code vectors of the codebook is the sum of the components of the centroid vector Myuwai _q centroid vector Myux _q wideband signal information of the narrowband signal information Dnb + Dwb.

辞書格納部３６４における事前の辞書λ3_qの学習生成方法の一手法について、フローチャートを図１５に示し、説明する。以下の説明では、上述した変形例１における辞書λ2_qの学習生成方法と同じ処理については同じ番号を付番し、説明を簡明にするために必要に応じて重複する説明を省略する。 FIG. 15 shows a flowchart of one method for learning and generating the dictionary λ3 _q in advance in the dictionary storage unit 364, and will be described. In the following description, the same processes as those in the learning generation method for the dictionary λ2 _q in the above-described modification 1 are given the same numbers, and redundant description is omitted as necessary for the sake of simplicity.

まず、広帯域信号データwb[n]を入力として、サンプリング周波数fs[Hz]にダウンサンプリングし狭帯域信号データnb[n]を得る（ステップＳ１０１）。そして、狭帯域信号データnb[n]から狭帯域信号情報を表す特徴量データである狭帯域特徴量データPnb[f,d]（d=1,…,Dnb）を抽出する（ステップＳ２０２）。このステップＳ２０２では、狭帯域信号データnb[n]のパワースペクトル（M次）を得て（ステップＳ１０２１）、狭帯域信号データnb[n]のパワー値を得て（ステップＳ１０２２）、これらのパワースペクトルとパワー値から狭帯域信号データnb[n]の正規化されたパワースペクトルを得て（ステップＳ１０２３）、式（２３）と同様にして狭帯域信号データnb[n]のマスキング閾値を算出する（ステップＳ３０２４）。そして、狭帯域信号データnb[n]のマスキング閾値に対して、帯域制御部３６３での処理と同様に帯域制御する（ステップＳ３０２５）。これを次数Dnb（=M_C）の狭帯域特徴量データPnb[f,d]（d=1,…,Dnb）とすることによって狭帯域特徴量データの抽出を行う。 First, the wideband signal data wb [n] is input and downsampling to the sampling frequency fs [Hz] to obtain narrowband signal data nb [n] (step S101). Then, narrowband feature data Pnb [f, d] (d = 1,..., Dnb), which is feature data representing the narrowband signal information, is extracted from the narrowband signal data nb [n] (step S202). In step S202, the power spectrum (Mth order) of the narrowband signal data nb [n] is obtained (step S1021), and the power value of the narrowband signal data nb [n] is obtained (step S1022). A normalized power spectrum of the narrowband signal data nb [n] is obtained from the spectrum and the power value (step S1023), and a masking threshold value of the narrowband signal data nb [n] is calculated in the same manner as Expression (23). (Step S3024). Then, the bandwidth control is performed on the masking threshold of the narrowband signal data nb [n] in the same manner as the processing in the bandwidth control unit 363 (step S3025). Narrowband feature quantity data is extracted by setting this as narrowband feature quantity data Pnb [f, d] (d = 1,..., Dnb) of order Dnb (= M _C ).

一方、広帯域信号データwb[n]から広帯域信号情報を表す特徴量データである広帯域特徴量データPwb[f,d]（d=1,…,Dwb）を抽出する（ステップＳ３０３）。このステップＳ３０３では、広帯域信号データwb[n]のパワースペクトル（2M次）を得て（ステップＳ１０３１）、広帯域信号データwb[n]から広帯域信号データwb[n]のパワー値を得て（ステップＳ２０３２）、これらのパワースペクトルとパワー値から広帯域信号データwb[n]の正規化されたパワースペクトルをフレーム単位で得て（ステップＳ２０３３）、式（２３）の次数をMから2Mにして同様にして広帯域信号データwb[n]のマスキング閾値を算出する（ステップＳ３０３４）。これを次数Dwb（=2M）の広帯域特徴量データPwb[f,d]（d=1,…,Dwb）とすることによって広帯域特徴量データの抽出を行う。 On the other hand, broadband feature data Pwb [f, d] (d = 1,..., Dwb), which is feature data representing broadband signal information, is extracted from the broadband signal data wb [n] (step S303). In step S303, the power spectrum (2M order) of the broadband signal data wb [n] is obtained (step S1031), and the power value of the broadband signal data wb [n] is obtained from the broadband signal data wb [n] (step S1031). In step S2032, the normalized power spectrum of the broadband signal data wb [n] is obtained from the power spectrum and the power value in units of frames (step S2033), and the order of equation (23) is changed from M to 2M in the same manner. Then, the masking threshold value of the wideband signal data wb [n] is calculated (step S3034). Broadband feature value data is extracted by using this as wideband feature value data Pwb [f, d] (d = 1,..., Dwb) of order Dwb (= 2M).

そして、連結特徴量データＰ[f,d]からコードブックの各コードベクトルにおける狭帯域セントロイドベクトルμx_qと広帯域セントロイドベクトルμy_qを求め、サイズＱ（ここではQ=64）のコードブックをｋ−ｍｅａｎｓアルゴリズムやＬＢＧアルゴリズムなどによるクラスタリング手法を用いて生成する（ステップＳ２０５）。コードブックの各コードベクトルにおける広帯域セントロイドベクトルμy_qである広帯域信号データwb[n]のマスキング閾値を近似多項式係数で表現して、近似多項式係数を広帯域セントロイドベクトルμ’y_qとして辞書に格納して、辞書λ3_q={μx_q,μ’y_q}（q=1,…,Q）を生成する（ステップＳ３０７）。近似多項式係数ｍ_p（p=0,…,P）とはここでは、縦軸をパワー値X[dB]、横軸を周波数Y[Hz]として、式（２０）のようにマスキング閾値を所定の次数（ここではPとし、例えばP=6とする）の多項式で近似した、その多項式の係数のことであり、これ以降そのように呼ぶ。

このように、マスキング閾値を近似多項式係数で表現して辞書として格納しておくことで、マスキング閾値を辞書として格納しておくよりも、辞書の格納に掛かるメモリ量を削減することができ、辞書の配列の数を小さくなるため辞書の利用時の処理量を削減することができる。 The connection feature quantity data P [f, d] seek narrowband centroid vector Myux _q wideband centroid vector Myuwai _q in each code vector of the codebook from the size Q codebook (Q = 64 in this case) The data is generated using a clustering method such as a k-means algorithm or an LBG algorithm (step S205). The masking threshold of the broadband signal data wb [n], which is the broadband centroid vector μy _q in each code vector of the codebook, is expressed as an approximate polynomial coefficient, and the approximate polynomial coefficient is stored in the dictionary as the wideband centroid vector μ'y _q Then, a dictionary λ3 _q = {μx _q , μ′y _q } (q = 1,..., Q) is generated (step S307). The approximate polynomial coefficient m _p (p = 0,..., P) is a predetermined masking threshold as shown in equation (20), where the vertical axis is the power value X [dB] and the horizontal axis is the frequency Y [Hz]. Is a coefficient of the polynomial approximated by a polynomial of the order (here, P, for example, P = 6), and will be referred to as such hereinafter.

Thus, by storing the masking threshold value as an approximate polynomial coefficient and storing it as a dictionary, it is possible to reduce the amount of memory required for storing the dictionary, rather than storing the masking threshold value as a dictionary. Since the number of arrays is reduced, the processing amount when using the dictionary can be reduced.

広帯域マスキング閾値算出部３６５は、帯域制御部３６３から出力された帯域制御された狭帯域マスキング閾値N_th[f,w] (w=0,1,…M_C-1)をDnb次の特徴量データとして入力し、辞書格納部３６４からコードブックの辞書λ3_q={μx_q,μ’y_q}（q=1,…,Q）を読み出して、Dnb次の狭帯域特徴量データとDwb次の広帯域特徴量データとの対応から周囲雑音の広帯域マスキング閾値N_wb_th1[f,w](w=0,1,…2M-1)を求める。具体的には、Q個ある狭帯域セントロイドベクトルμx_q（q=1,…,Q）から、帯域制御された狭帯域マスキング閾値N_th[f,w] (w=0,1,…M_C-1)と所定の距離尺度で一番距離が近いものを求めて、一番距離が近いコードベクトルにおける広帯域セントロイドベクトルμ’y_qをそのまま広帯域マスキング閾値の近似多項式係数として設定し、式（２０）と同様にして広帯域マスキング閾値N_wb_th1[f,w](w=0,1,…2M-1)を算出する。 The wideband masking threshold calculation unit 365 uses the band-controlled narrowband masking threshold N_th [f, w] (w = 0, 1,... M _C −1) output from the band control unit 363 as the Dnb-order feature amount data. And the codebook dictionary λ3 _q = {μx _q , μ′y _q } (q = 1,..., Q) is read from the dictionary storage unit 364, and the Dnb-order narrowband feature data and the Dwb-order data are read. A broadband masking threshold N_wb_th1 [f, w] (w = 0, 1,... 2M-1) of ambient noise is obtained from the correspondence with the broadband feature data. Specifically, Q-number is narrowband centroid vector _{μx q (q = 1, ...} , Q) from the bandwidth control narrowband masking threshold N_th [f, w] (w = 0,1, ... M C -1) and a predetermined distance scale having the closest distance, the broadband centroid vector μ'y _q in the code vector with the closest distance is set as an approximate polynomial coefficient of the broadband masking threshold as it is, The broadband masking threshold N_wb_th1 [f, w] (w = 0, 1,... 2M−1) is calculated in the same manner as 20).

閾値補正部３６６は、狭帯域マスキング閾値算出部３６２から出力された周囲雑音の狭帯域マスキング閾値N_th1[f,w] (w=0,1,…M-1)と広帯域マスキング閾値算出部３６５から出力された周囲雑音の広帯域マスキング閾値N_wb_th1[f,w] (w=0,1,…2M-1)を入力として、狭帯域と広帯域における境界帯域付近での不連続性あるいは微分不連続性を解消するように補正し、その補正された広帯域マスキング閾値N_wb_th2[f,w] (w=0,1,…2M-1)を出力する。図１６（ａ）に、境界帯域fs/2[Hz]前後の周波数において、狭帯域マスキング閾値N_th[f,w]と広帯域マスキング閾値N_wb_th1[f,w]とに不連続性が生じ、それを解消するように補正された広帯域マスキング閾値N_wb_th2[f,w]の例を示す。図１６（ｂ）に、境界帯域fs/2[Hz]前後の周波数において、狭帯域マスキング閾値N_th[f,w]と広帯域マスキング閾値N_wb_th1[f,w]とに不連続性と微分不連続性の両方が生じ、それを解消するように補正された広帯域マスキング閾値N_wb_th2[f,w]の例を示す。両図共に、実線は狭帯域マスキング閾値N_th[f,w]を、破線は広帯域マスキング閾値N_wb_th2[f,w]を、太実線は補正された広帯域マスキング閾値N_wb_th2[f,w]における補正箇所を表す。ただし、adjust_low[Hz] < fs/2 < adjust_high[Hz]とする。ここで、adjust_lowは周波数ビンω_L−１に対応する周波数以上で周波数ビンω_Lに対応する周波数未満であり、adjust_highは周波数ビンω_Hに対応する周波数以上で周波数ビンω_H＋１に対応する周波数未満であるとする。例えばfs=8000[Hz]であるとき、adjust_low=3600[Hz]、adjust_high=4400[Hz]とする。具体的には、少なくとも境界帯域fs/2[Hz]前後の周波数において不連続あるいは微分不連続が検出された場合に、adjust_low[Hz]以上かつadjust_high[Hz]以下であるような境界帯域
付近について、周波数ビンω_L、ω_L＋１…、ω_L＋Sとω_H、ω_H−１…、ω_H−Sにおける広帯域マスキング閾値N_wb_th1[f,w]を用いて、周波数ビンω_L＋S＋１からω_H−S−１までの広帯域マスキング閾値を（2S-1）次関数で模擬し、スプライン補間を行うことで、補正された広帯域マスキング閾値N_wb_th2[f,w]を求める。ここで、狭帯域マスキング閾値N_th1[f,M-1]と広帯域マスキング閾値N_wb_th1[f,M]との中点を通過するように模擬する関数を設定してスプライン補間を行ってもよい。 The threshold correction unit 366 includes the ambient noise narrowband masking threshold N_th1 [f, w] (w = 0, 1,... M−1) output from the narrowband masking threshold calculation unit 362 and the wideband masking threshold calculation unit 365. Using the output ambient noise wideband masking threshold N_wb_th1 [f, w] (w = 0,1,… 2M-1) as input, the discontinuity or differential discontinuity near the boundary band in the narrowband and wideband Then, the corrected broadband masking threshold N_wb_th2 [f, w] (w = 0, 1,... 2M−1) is output. In FIG. 16A, a discontinuity occurs between the narrowband masking threshold N_th [f, w] and the wideband masking threshold N_wb_th1 [f, w] at frequencies around the boundary band fs / 2 [Hz]. An example of a wideband masking threshold N_wb_th2 [f, w] corrected to be eliminated is shown. FIG. 16B shows discontinuities and differential discontinuities between the narrowband masking threshold N_th [f, w] and the wideband masking threshold N_wb_th1 [f, w] at frequencies around the boundary band fs / 2 [Hz]. An example of a wideband masking threshold N_wb_th2 [f, w] corrected so as to eliminate both of the above is shown. In both figures, the solid line is the narrowband masking threshold N_th [f, w], the broken line is the wideband masking threshold N_wb_th2 [f, w], and the thick solid line is the corrected broadband masking threshold N_wb_th2 [f, w]. Represent. However, adjust_low [Hz] <fs / 2 <adjust_high [Hz]. Here, adjust_low is equal to or higher than the frequency corresponding to the frequency bin ω _L −1 and lower than the frequency corresponding to the frequency bin ω _L , and adjust_high is equal to or higher than the frequency corresponding to the frequency bin ω _H and the frequency corresponding to the frequency bin ω _H +1. Less than. For example, when fs = 8000 [Hz], adjust_low = 3600 [Hz] and adjust_high = 4400 [Hz]. Specifically, when a discontinuity or differential discontinuity is detected at least at frequencies around the boundary band fs / 2 [Hz], the boundary band is near adjust_low [Hz] and below adjust_high [Hz]. , frequency bin _{_{ω L, ω L + 1 ...}} , ω L + S and _{_{ω H, ω H -1 ...,}} ω broadband masking threshold N_wb_th1 in _H -S using the [f, w], frequency bin ω _L + S + 1 from ω _H A broadband masking threshold value N_wb_th2 [f, w] is obtained by simulating the broadband masking threshold value up to -S-1 with a (2S-1) degree function and performing spline interpolation. Here, spline interpolation may be performed by setting a function that simulates passing through the midpoint between the narrowband masking threshold N_th1 [f, M-1] and the wideband masking threshold N_wb_th1 [f, M].

このように閾値補正部３６６において広帯域マスキング閾値を補正することで、広帯域マスキング閾値における不連続性あるいは微分不連続性が解消され、信号補正においても周波数方向の不連続性が無くなって違和感のない自然な信号補正にすることができ、高い明瞭感を得ることができる。 In this way, by correcting the broadband masking threshold in the threshold correction unit 366, discontinuity or differential discontinuity in the broadband masking threshold is eliminated, and there is no discontinuity in the frequency direction even in signal correction, so that there is no sense of incongruity. Signal correction and high clarity.

以上のように、再生される入力信号と集音信号で、信号成分が存在する周波数帯域が異なっていたり、サンプリング周波数が異なっていたりしていても、集音信号のマスキング閾値について入力信号の周波数帯域を加味して帯域拡張して推定することで、集音信号のマスキング閾値が高精度に求まり、入力信号の明瞭度を向上させることができる。 As described above, even if the input signal to be reproduced and the collected sound signal have different frequency bands in which signal components exist or the sampling frequency differs, the frequency of the input signal with respect to the masking threshold of the collected sound signal By estimating the band by expanding the band in consideration of the band, the masking threshold value of the collected signal can be obtained with high accuracy, and the clarity of the input signal can be improved.

（信号処理部の変形例４）
信号処理部３００の辞書格納部３６４における事前の辞書λ3_qの学習生成方法の他の手法について、フローチャートを図１７に示し、説明する。ここでは、狭帯域信号データnb[n]を生成しないで広帯域信号データwb[n]のみから辞書λ3_qを学習生成する方法について説明する。以下の説明では、上述した変形例２における辞書λ3_qの学習生成方法と同じ処理については同じ番号を付番し、説明を簡明にするために必要に応じて重複する説明を省略する。 (Modification 4 of the signal processing unit)
FIG. 17 shows a flowchart of another method for learning and generating the prior dictionary λ3 _q in the dictionary storage unit 364 of the signal processing unit 300, and will be described. Here, a method of learning and generating the dictionary λ3 _q from only the broadband signal data wb [n] without generating the narrowband signal data nb [n] will be described. In the following description, the same processes as those in the learning generation method of the dictionary λ3 _{q in} the second modification described above are given the same numbers, and redundant description is omitted as necessary for the sake of simplicity.

まず、ステップＳ３０３で広帯域信号データwb[n]から広帯域信号情報を表す特徴量データ（ここではマスキング閾値）である広帯域特徴量データPwb[f,d]（d=1,…,Dwb）を抽出する。この広帯域特徴量データPwb[f,d]（d=1,…,Dwb）のみを用いて、ステップＳ２０５でサイズＱのコードブックを作成する。そして、コードブックの各コードベクトルにおける広帯域セントロイドベクトルμy_qである広帯域信号データwb[n]の広帯域マスキング閾値に対して、帯域制御する下限周波数limit_low[Hz]から帯域制御する上限周波数limit_high[Hz]までの周波数帯域の広帯域マスキング閾値のみを用いるように制御する（ステップＳ３０２５）。これにより狭帯域に帯域制御された狭帯域マスキング閾値が求まり、これをコードブックの各コードベクトルにおける狭帯域セントロイドベクトルμx_q（q=1,…,Q）とする（ステップＳ３０６）。その後、ステップＳ３０７で広帯域信号データwb[n]のマスキング閾値の近似多項式係数である広帯域セントロイドベクトルμ’y_qと併せて辞書に格納して、辞書λ3_q={μx_q,μ’y_q}を生成する。 First, in step S303, broadband feature amount data Pwb [f, d] (d = 1,..., Dwb) that is feature amount data (here, a masking threshold value) representing broadband signal information is extracted from the broadband signal data wb [n]. To do. A size Q codebook is created in step S205 using only the wideband feature data Pwb [f, d] (d = 1,..., Dwb). Then, the broadband masking threshold value of the wideband signal data wb [n] is a wideband centroid vector Myuwai _q in each code vector of the codebook, upper frequency limit_high [Hz for bandwidth control from the lower limit frequency limit_low [Hz] for bandwidth control ] Is controlled so as to use only the wideband masking threshold of the frequency band up to (step S3025). As a result, a narrowband masking threshold whose bandwidth is controlled to be narrowband is obtained, and this is set as a narrowband centroid vector μx _q (q = 1,..., Q) in each code vector of the codebook (step S306). After that, in step S307, it is stored in the dictionary together with the broadband centroid vector μ′y _q which is an approximate polynomial coefficient of the masking threshold of the broadband signal data wb [n], and the dictionary λ3 _q = {μx _q , μ′y _q } Is generated.

狭帯域特徴量データを併用してクラスタリングする図１５における手法では、狭帯域特徴量データに狭帯域と広帯域における境界帯域付近で誤差を含む。このように、広帯域特徴量データのみを用いてクラスタリングして、広帯域セントロイドベクトルを帯域制限して狭帯域セントロイドベクトルを求めることで、理想的なデータである広帯域特徴量データのみを用いてクラスタリングするため、図１５における手法よりも、高精度にクラスタリングを行うことができる。 In the method in FIG. 15 in which clustering is performed using narrowband feature data together, the narrowband feature data includes an error near the boundary band between the narrowband and the wideband. In this way, clustering is performed using only broadband feature value data, and by performing band limitation on the broadband centroid vector to obtain a narrowband centroid vector, clustering is performed using only broadband feature value data that is ideal data. Therefore, clustering can be performed with higher accuracy than the method in FIG.

（信号処理部の変形例５）
信号処理部３００の辞書格納部３６４における事前の辞書λ3_qの学習生成方法の他の手法について、フローチャートを図１８に示し、説明する。以下の説明では、上述した変形例２における辞書λ3_qの学習生成方法と同じ処理については同じ番号を付番し、説明を簡明にするために必要に応じて重複する説明を省略する。 (Modification 5 of the signal processing unit)
FIG. 18 shows a flowchart of another method for learning and generating the dictionary λ3 _q in advance in the dictionary storage unit 364 of the signal processing unit 300, and will be described. In the following description, the same processes as those in the learning generation method of the dictionary λ3 _{q in} the second modification described above are given the same numbers, and redundant description is omitted as necessary for the sake of simplicity.

ステップＳ２０５でサイズＱのコードブックを作成した後、コードブックの各コードベクトルにおける狭帯域セントロイドベクトルμx_qである狭帯域信号データnb[n]のマスキング閾値を式（２０）のように近似多項式で表現して、近似多項式係数を狭帯域セントロイドベクトルμ’x_q（q=1,…,Q）とする（ステップＳ３０６Ａ）。その後、ステップＳ３０７で広帯域信号データwb[n]のマスキング閾値の近似多項式係数である広帯域セントロイドベクトルμ’y_qと併せて辞書に格納して、辞書λ3_q={μ’x_q,μ’y_q}を生成する。 After creating the codebook size Q in step S205, the approximate polynomial as in equation (20) a masking threshold of the narrowband signal data nb [n] is a narrowband centroid vector Myux _q in each code vector of the codebook The approximate polynomial coefficient is defined as a narrowband centroid vector μ′x _q (q = 1,..., Q) (step S306A). Thereafter, in step S307, the broadband centroid vector μ′y _q , which is an approximate polynomial coefficient of the masking threshold of the broadband signal data wb [n], is stored in the dictionary, and the dictionary λ3 _q = {μ′x _q , μ ′. y _q } is generated.

一方で、この手法においては、広帯域マスキング閾値算出部３６５では、帯域制御部３６３から出力された帯域制御された狭帯域マスキング閾値N_th[f,w] (w=0,1,…M_C-1)をDnb次の特徴量データとして入力し、辞書格納部３６４からコードブックの辞書λ3_q={μ’x_q,μ’y_q}（q=1,…,Q）を読み出して、Dnb次の狭帯域特徴量データとDwb次の広帯域特徴量データとの対応から周囲雑音の広帯域マスキング閾値N_wb_th1[f,w](w=0,1,…2M-1)を求めるようにする。具体的には、Q個ある狭帯域セントロイドベクトルμ’x_q（q=1,…,Q）の近似多項式から、帯域制御された狭帯域マスキング閾値N_th[f,w] (w=0,1,…M_C-1)と所定の距離尺度で一番距離が近いものを近似多項式に代入していくことで求めて、一番距離が近いコードベクトルにおける広帯域セントロイドベクトルμ’y_qをそのまま広帯域マスキング閾値の近似多項式係数として設定し、式（２０）と同様にして広帯域マスキング閾値N_wb_th1[f,w](w=0,1,…2M-1)を算出する。 On the other hand, in this method, the wideband masking threshold calculation unit 365 has a band-controlled narrowband masking threshold N_th [f, w] (w = 0, 1,... M _C −1) output from the band control unit 363. ) As Dnb-order feature data, and the codebook dictionary λ3 _q = {μ'x _q , μ'y _q } (q = 1,..., Q) is read from the dictionary storage unit 364, A broadband masking threshold N_wb_th1 [f, w] (w = 0, 1,... 2M-1) of ambient noise is obtained from the correspondence between the narrowband feature quantity data and the Dwb-order broadband feature quantity data. Specifically, from an approximate polynomial of Q narrowband centroid vectors μ′x _q (q = 1,..., Q), a band-controlled narrowband masking threshold N_th [f, w] (w = 0, 1, ... M _C -1) and the nearest distance in the given distance scale are substituted into the approximate polynomial, and the wideband centroid vector μ'y _q in the code vector with the nearest distance is obtained. The broadband masking threshold is set as an approximate polynomial coefficient as it is, and the broadband masking threshold N_wb_th1 [f, w] (w = 0, 1,... 2M−1) is calculated in the same manner as in equation (20).

このように、狭帯域マスキング閾値も近似多項式係数で表現して辞書として格納しておくことで、マスキング閾値を辞書として格納しておくよりも、図１５における手法と比較しても、辞書の格納に掛かるメモリ量を削減することができ、辞書の配列の数を小さくなるため辞書の利用時の処理量を削減することができる。 Thus, by storing the narrowband masking threshold value as an approximate polynomial coefficient and storing it as a dictionary, storing the dictionary compared to the method in FIG. 15 rather than storing the masking threshold value as a dictionary. The amount of memory required for the dictionary can be reduced, and the number of dictionary arrays can be reduced, so that the processing amount when using the dictionary can be reduced.

（第２の実施例）
図１９（ａ）は、本発明の第２の実施形態に係わる通信装置の構成を示すものである。 (Second embodiment)
FIG. 19A shows the configuration of a communication apparatus according to the second embodiment of the present invention.

この図に示す通信装置は、例えば携帯電話などの無線通信装置の受信系を示すものであって、無線通信部１と、デコーダ２と、信号処理部３Ａと、ディジタル・アナログ（D/A）変換器４と、スピーカ５と、マイク６と、アナログ・ディジタル（A/D）変換器７と、ダウンサンプリング部８と、エコー抑圧処理部９と、エンコーダ１０とを備えている。 The communication apparatus shown in this figure shows a reception system of a wireless communication apparatus such as a cellular phone, for example, and includes a wireless communication unit 1, a decoder 2, a signal processing unit 3A, and a digital / analog (D / A). A converter 4, a speaker 5, a microphone 6, an analog / digital (A / D) converter 7, a downsampling unit 8, an echo suppression processing unit 9, and an encoder 10 are provided.

なお、第１の実施例と同様に、本発明は、図１９（ａ）のような通信装置だけでなく、図１９（ｂ）に示すディジタルオーディオプレイヤに適用することも可能である。また、図１９（ｃ）に示す音声帯域拡張通話装置に適用することも可能である。 As in the first embodiment, the present invention can be applied not only to the communication apparatus as shown in FIG. 19 (a) but also to the digital audio player shown in FIG. 19 (b). Also, the present invention can be applied to the voice band extended call device shown in FIG.

次に、信号処理部３Ａについて説明する。図２０は、その構成を示すものである。信号処理部３Ａは、第１の実施例にて説明した信号処理部３に周囲雑音抑圧処理部３７を追加して構成される。以下の説明では、上述した実施例と同じ構成については同じ番号を付番し、必要に応じて重複する説明を省略する。 Next, the signal processing unit 3A will be described. FIG. 20 shows the configuration. The signal processing unit 3A is configured by adding an ambient noise suppression processing unit 37 to the signal processing unit 3 described in the first embodiment. In the following description, the same components as those in the above-described embodiment are given the same numbers, and redundant descriptions are omitted as necessary.

図２１に周囲雑音抑圧処理部３７の構成例を示す。周囲雑音抑圧処理部３７は、抑圧ゲイン算出部３７１と、スペクトル抑圧部３７２と、パワー算出部３７３と、時間領域変換部３７４とを備える。 FIG. 21 shows a configuration example of the ambient noise suppression processing unit 37. The ambient noise suppression processing unit 37 includes a suppression gain calculation unit 371, a spectrum suppression unit 372, a power calculation unit 373, and a time domain conversion unit 374.

周囲雑音抑圧処理部３７は、周囲雑音推定部３１から出力される周囲雑音のパワースペクトルと集音信号z[n]のパワースペクトルと集音信号z[n]の周波数スペクトルを用いて、集音信号z[n]に含まれる周囲雑音であるノイズ成分を抑圧して、周囲雑音であるノイズ成分が抑圧された信号s[n]をエンコーダ１０に出力する。エンコーダ１０では周囲雑音抑圧処理部３７から出力された信号s[n]を符号化して無線通信部１に出力する。 The ambient noise suppression processing unit 37 uses the ambient noise power spectrum output from the ambient noise estimation unit 31, the power spectrum of the collected sound signal z [n], and the frequency spectrum of the collected sound signal z [n] to collect sound. A noise component that is ambient noise included in the signal z [n] is suppressed, and a signal s [n] in which the noise component that is ambient noise is suppressed is output to the encoder 10. The encoder 10 encodes the signal s [n] output from the ambient noise suppression processing unit 37 and outputs it to the wireless communication unit 1.

抑圧ゲイン算出部３７１は、パワー算出部３１２から出力される集音信号z[n]のパワースペクトル|Z[f,w]|² (w=0,1,…M-1)と、周波数スペクトル更新部３１４から出力される周囲雑音のパワースペクトル|N[f,w]|² (w=0,1,…M-1)と、パワー算出部３７３から出力される１フレーム前の抑圧処理された信号のパワースペクトル|S[f-1,w]|² (w=0,1,…M-1)とを用いて、各周波数帯域の抑圧ゲインG[f,w] (w=0,1,…M-1)を出力する。例えば、抑圧ゲインG[f,w]の算出は、以下のアルゴリズムまたはそれらの組み合わせによって行う。すなわち、一般のノイズキャンセラであるスペクトル・サブトラクション（Spectral Subtraction）法（S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol.ASSP-29, pp.113-120 (1979).）、ウィナー・フィルター（Wiener Filter）法（J. S. Lim, A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech”, Proc. IEEE Vol.67, No.12, pp.1586-1604, Dec.1979.）及び最尤推定（Maximum Likelihood）法（R. J. McAulay, M. L. Malpass, “Speech enhancement using a soft-decision noise suppression filter”, IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.ASSP-28, no.2, pp.137-145, Apr.1980.）などである。ここでは一例としてウィナー・フィルター法を用いて、抑圧ゲインG[f,w]を算出するとする。 The suppression gain calculation unit 371 includes the power spectrum | Z [f, w] | ² (w = 0, 1,... M−1) of the collected sound signal z [n] output from the power calculation unit 312 and the frequency spectrum. The power spectrum | N [f, w] | ² (w = 0, 1,... M−1) of ambient noise output from the update unit 314 and the suppression process one frame before output from the power calculation unit 373 are performed. Signal power spectrum | S [f−1, w] | ² (w = 0,1,... M−1) and the suppression gain G [f, w] (w = 0, 1, ... M-1) is output. For example, the suppression gain G [f, w] is calculated by the following algorithm or a combination thereof. That is, Spectral Subtraction (SF Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol.ASSP-29, pp. 113-120 (1979)), Wiener Filter method (JS Lim, AV Oppenheim, “Enhancement and bandwidth compression of noisy speech”, Proc. IEEE Vol.67, No.12, pp.1586-1604 Dec. 1979) and Maximum Likelihood Method (RJ McAulay, ML Malpass, “Speech enhancement using a soft-decision noise suppression filter”, IEEE Trans. On Acoustics, Speech, and Signal Processing, vol.ASSP -28, no.2, pp.137-145, Apr.1980.). Here, as an example, it is assumed that the suppression gain G [f, w] is calculated using the Wiener filter method.

スペクトル抑圧部３７２は、周波数領域変換部３１１から出力された集音信号z[n]の周波数スペクトルZ[f,w] と、抑圧ゲイン算出部３７１から出力された抑圧ゲインG[f,w]とを入力として、集音信号z[n]の周波数スペクトルZ[f,w]を集音信号z[n]の振幅スペクトル|Z[f,w]| (w=0,1,…M-1)と位相スペクトルθ_Z[f,w] (w=0,1,…M-1)に分け、集音信号z[n]の振幅スペクトル|Z[f,w]| に抑圧ゲインG[f,w]を乗じることで周囲雑音であるノイズ成分を抑圧し、その抑圧処理された信号の振幅スペクトル|S[f-1,w]|とし、位相スペクトル_θZ[f,w]をそのまま抑圧処理された信号の位相スペクトルθ_S[f,w]として、抑圧処理された信号の周波数スペクトルS[f,w] (w=0,1,…M-1)を算出する。 The spectrum suppression unit 372 includes the frequency spectrum Z [f, w] of the collected sound signal z [n] output from the frequency domain conversion unit 311 and the suppression gain G [f, w] output from the suppression gain calculation unit 371. And the frequency spectrum Z [f, w] of the collected signal z [n] as the amplitude spectrum of the collected signal z [n] | Z [f, w] | (w = 0,1,… M- 1) and phase spectrum θ _Z [f, w] (w = 0,1,… M-1), and the suppression gain G [ The noise component, which is ambient noise, is suppressed by multiplying by f, w], and the amplitude spectrum | S [f-1, w] | of the processed signal is suppressed, and the phase spectrum _θZ [f, w] is directly suppressed The frequency spectrum S [f, w] (w = 0, 1,... M−1) of the suppressed signal is calculated as the phase spectrum θ _S [f, w] of the processed signal.

パワー算出部３７３は、スペクトル抑圧部３７２から出力された抑圧処理された信号の周波数スペクトルS[f,w] (w=0,1,…M-1)から抑圧処理された信号のパワースペクトル|S[f,w]|² (w=0,1,…M-1)を算出し出力する。 The power calculation unit 373 generates a power spectrum of the signal subjected to the suppression process from the frequency spectrum S [f, w] (w = 0, 1,... M−1) of the signal subjected to the suppression process output from the spectrum suppression unit 372. Calculate and output S [f, w] | ² (w = 0,1, ... M-1).

時間領域変換部３７４は、スペクトル抑圧部３７２から出力された抑圧処理された信号の周波数スペクトルS[f,w] (w=0,1,…M-1)を入力として、周波数領域を時間領域に変換する処理（例えば、ＩＦＦＴ）を施し、周波数領域変換部３１１における窓掛けによるオーバーラップ分を考慮して１フレーム前の抑圧処理された信号s[n]を適宜加算して、抑圧処理された時間領域の信号s[n] (n=0,1,…N-1)を算出する。 Time domain transform section 374 receives frequency spectrum S [f, w] (w = 0, 1,... M−1) of the signal subjected to the suppression process output from spectrum suppression section 372 as an input, and uses the frequency domain as the time domain. The signal s [n], which has been subjected to suppression processing one frame before in consideration of the overlap due to windowing in the frequency domain conversion unit 311, is appropriately added and subjected to suppression processing. The time-domain signal s [n] (n = 0, 1,... N−1) is calculated.

以上のように、周囲雑音推定処理に周囲雑音抑圧処理を併用することで処理量の増加を抑えつつ、入力信号を明瞭化すると同時に、集音信号における周囲雑音成分を抑圧して高音質な集音信号を得ることができる。 As described above, by combining ambient noise estimation processing with ambient noise suppression processing, the input signal is clarified while suppressing an increase in processing amount, and at the same time, high-quality sound collection is performed by suppressing ambient noise components in the collected sound signal. A sound signal can be obtained.

なお、この発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また上記実施形態に開示されている複数の構成要素を適宜組み合わせることによって種々の発明を形成できる。また例えば、実施形態に示される全構成要素からいくつかの構成要素を削除した構成も考えられる。さらに、異なる実施形態に記載した構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. Further, for example, a configuration in which some components are deleted from all the components shown in the embodiment is also conceivable. Furthermore, you may combine suitably the component described in different embodiment.

例えば、入力信号（あるいは目的信号）のサンプリング周波数は、集音信号（あるいは周囲雑音）のサンプリング周波数の２倍に限定されるものではなく、整数倍でも、非整数倍でもよい。また、入力信号（あるいは目的信号）のサンプリング周波数は、集音信号（あるいは周囲雑音）のサンプリング周波数と等しい上で、入力信号（あるいは目的信号）の周波数帯域制限の範囲と集音信号（あるいは周囲雑音）の周波数帯域制限の範囲が異なっている場合であっても構わない。さらに、入力信号（あるいは目的信号）の周波数帯域制限の範囲が、集音信号（あるいは周囲雑音）の周波数帯域制限の範囲を包含していなくても構わない。さらにまた、入力信号（あるいは目的信号）の周波数帯域制限の範囲は、集音信号（あるいは周囲雑音）の周波数帯域制限の範囲と隣接していなくても構わない。 For example, the sampling frequency of the input signal (or target signal) is not limited to twice the sampling frequency of the sound collection signal (or ambient noise), and may be an integer multiple or a non-integer multiple. In addition, the sampling frequency of the input signal (or target signal) is equal to the sampling frequency of the sound collection signal (or ambient noise), and the range of the frequency band limitation of the input signal (or target signal) and the sound collection signal (or surroundings) (Noise) frequency band restriction range may be different. Furthermore, the range of the frequency band limitation of the input signal (or target signal) may not include the range of the frequency band limitation of the collected sound signal (or ambient noise). Furthermore, the frequency band restriction range of the input signal (or target signal) may not be adjacent to the frequency band restriction range of the collected sound signal (or ambient noise).

また、入力信号がモノラル信号ではなくステレオ信号であったとしても、例えばＬ（左）チャネルとＲ（右）チャネルにそれぞれ上記信号処理部３における信号処理を施したり、和信号（ＬチャネルとＲチャネルの信号の和）と差信号（ＬチャネルからＲチャネルの信号の差）にそれぞれ上記の信号処理を施したりすることで同様の効果が得られる。勿論、マルチチャネル信号であったとしても例えば同様にそれぞれのチャネル信号に対して上記の信号処理を施したりすることで同様の効果が得られる。 Even if the input signal is not a monaural signal but a stereo signal, for example, the signal processing unit 3 performs signal processing on the L (left) channel and the R (right) channel, respectively, or the sum signal (L channel and R channel). The same effect can be obtained by subjecting each of the above signal processing to the sum of the channel signals) and the difference signal (difference between the L channel signal and the R channel signal). Of course, even if it is a multi-channel signal, the same effect can be obtained by, for example, similarly performing the above signal processing on each channel signal.

その他、この発明の要旨を逸脱しない範囲で種々の変形を施しても同様に実施可能であることはいうまでもない。 In addition, it goes without saying that the present invention can be similarly implemented even if various modifications are made without departing from the gist of the present invention.

１…無線通信部、２，２Ａ…デコーダ、３，３０，３００，３Ａ…信号処理部、４…ディジタル・アナログ（D/A）変換器、５…スピーカ、６…マイク、７…アナログ・ディジタル（A/D）変換器、８…ダウンサンプリング部、９…エコー抑圧処理部、１０…エンコーダ、１１…記憶部、１２…信号帯域拡張処理部、３１…周囲雑音推定部、３２，３４，３６…周囲雑音情報帯域拡張部、３３，３５…信号特性補正部、３７…周囲雑音抑圧処理部、３１１，３３１…周波数領域変換部、３１２，３５２，３７３…パワー算出部、３１３…周囲雑音区間判定部、３１４…周波数スペクトル更新部、３２１…パワー正規化部、３２２，３４２，３６４…辞書格納部、３２３…広帯域パワー算出部、３３２，３５６…補正度合決定部、３３３…補正処理部、３３４，３７４…時間領域変換部、３４３…広帯域パワースペクトル算出部、３４４，３６５…広帯域マスキング閾値算出部、３４５…パワー制御部、３５３…マスキング閾値算出部、３５４…マスキング判定部、３５５…パワー平滑化部、３６２…狭帯域マスキング閾値算出部、３６３…帯域制御部、３６６…閾値補正部、３７１…抑圧ゲイン算出部、３７２…スペクトル抑圧部。 DESCRIPTION OF SYMBOLS 1 ... Wireless communication part, 2, 2A ... Decoder, 3, 30, 300, 3A ... Signal processing part, 4 ... Digital-analog (D / A) converter, 5 ... Speaker, 6 ... Microphone, 7 ... Analog / digital (A / D) converter, 8 ... down-sampling unit, 9 ... echo suppression processing unit, 10 ... encoder, 11 ... storage unit, 12 ... signal band expansion processing unit, 31 ... ambient noise estimation unit, 32, 34, 36 ... Ambient noise information band expansion unit, 33, 35 ... Signal characteristic correction unit, 37 ... Ambient noise suppression processing unit, 311,331 ... Frequency domain conversion unit, 312,352,373 ... Power calculation unit, 313 ... Ambient noise section determination , 314 ... frequency spectrum update unit, 321 ... power normalization unit, 322, 342 and 364 ... dictionary storage unit, 323 ... broadband power calculation unit, 332, 356 ... correction degree determination unit, 333 ... correction processing 334, 374 ... time domain conversion unit, 343 ... broadband power spectrum calculation unit, 344, 365 ... broadband masking threshold calculation unit, 345 ... power control unit, 353 ... masking threshold calculation unit, 354 ... masking determination unit, 355 ... power Smoothing unit, 362 ... narrow band masking threshold calculation unit, 363 ... band control unit, 366 ... threshold correction unit, 371 ... suppression gain calculation unit, 372 ... spectrum suppression unit.

Claims

A signal processing device that changes a frequency characteristic with respect to an input signal band-limited to a first frequency range,
Ambient noise extraction means for extracting ambient noise contained in the collected sound signal;
Information extracting means for extracting frequency characteristic information of a second frequency range from the ambient noise extracted by the ambient noise extracting means;
Storage means for storing the frequency characteristic information of the second frequency range of the signal acquired in advance and the frequency characteristic information of the first frequency range in association with each other;
Using the correspondence between the frequency characteristic information of the frequency characteristic information and the first frequency range of the second frequency range stored in the storage unit, the frequency characteristic information extracted by the information extraction means, said first Frequency characteristic information in a third frequency range excluding the second frequency range is estimated, and the estimated frequency characteristic information and the frequency characteristic information extracted by the information extracting means are A frequency characteristic information expansion unit that extends the frequency characteristic information in the frequency direction to the first frequency range by correcting so as to be continuous at a boundary between the second frequency range and the third frequency range ;
Signal correcting means for changing the frequency characteristics of the input signal according to the frequency characteristic information obtained by the frequency characteristic information extending means;
A signal processing apparatus comprising:

Frequency characteristic information extracted by the information extraction means, the signal processing apparatus according to claim 1, characterized in that the masking level for each frequency.

3. The signal processing apparatus according to claim 2 , wherein the masking level for each frequency extracted by the information extracting unit is approximated by a polynomial expression.

A signal processing device that changes a frequency characteristic with respect to an input signal band-limited to a first frequency range,
Ambient noise extraction means for extracting ambient noise contained in the collected sound signal;
Information extracting means for extracting frequency characteristic information in a second frequency range narrower than the first frequency range from the ambient noise extracted by the ambient noise extracting means;
Storage means for storing the frequency characteristic information of the second frequency range of the signal acquired in advance and the frequency characteristic information of the first frequency range in association with each other;
The frequency characteristic information extracted by the information extraction means using the correspondence between the frequency characteristic information of the second frequency range stored in the storage means and the frequency characteristic information of the first frequency range .
In the third frequency range excluding the second frequency range in the first frequency range.
The frequency characteristic information is estimated, and the estimated frequency characteristic information and the information extracting means extract the frequency characteristic information.
The frequency characteristic information obtained at the boundary between the second frequency range and the third frequency range.
Frequency characteristic information extending means for extending frequency characteristic information in the frequency direction to the first frequency range by performing correction so as to be continuous in
A signal processing apparatus comprising: a signal correcting unit that changes a frequency characteristic of the input signal according to frequency characteristic information obtained by the frequency characteristic information extending unit.

5. The signal processing apparatus according to claim 4, wherein the frequency characteristic information extracted by the information extraction means is a masking level for each frequency.

6. The signal processing apparatus according to claim 5, wherein the masking level for each frequency extracted by the information extraction unit is approximated by a polynomial expression.