JP2002044793A

JP2002044793A - Method and apparatus for sound signal processing

Info

Publication number: JP2002044793A
Application number: JP2000223616A
Authority: JP
Inventors: Koji Kushida; 孝司櫛田
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2000-07-25
Filing date: 2000-07-25
Publication date: 2002-02-08

Abstract

PROBLEM TO BE SOLVED: To achieve enhancement and suppression for only specified components of a positioned signal at a specified position without loss of stereo feelings for stereo sound signals. SOLUTION: In a method for sound signal processing, inputted sound signals in plural channels are frequency-analyzed for every channel and also the specified components to be enhanced and suppressed among the channels are designated. An operational processing among complex spectrum channels composing the components is processed only for the area of the components among sound signals in frequency areas of every channel, so that the components are enhanced and suppressed, finally the sound signal in the frequency area is respectively put back to time areas as to be outputted.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、ステレオ音響信
号から特定成分の信号を強調又は抑圧する音響信号処理
方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio signal processing method and apparatus for emphasizing or suppressing a specific component signal from a stereo audio signal.

【０００２】[0002]

【従来の技術】従来、例えばステレオソースに対し、セ
ンター定位のボーカルをキャンセルする目的で、時間領
域で左チャネルの音声信号から右チャネルの音声信号を
減算する演算処理（以下、「Ｌ−Ｒ処理」と呼ぶ）をす
ることは良く行われており、例えばフィルタ処理によっ
てＬ−Ｒ処理対象を音声帯域に限定するなどの工夫がな
されてきた（例えば特開平５−３５２８３号）。2. Description of the Related Art Conventionally, for example, for the purpose of canceling a center-localized vocal with respect to a stereo source, an arithmetic processing for subtracting a right channel audio signal from a left channel audio signal in a time domain (hereinafter referred to as "LR processing"). ) Is often performed. For example, a device such as a filter processing is used to limit the LR processing target to an audio band (for example, JP-A-5-35283).

【０００３】[0003]

【発明が解決しようとする課題】しかし、このような時
間領域でのＬ−Ｒ処理では、入力音声信号がステレオ信
号でも出力される信号は少なくとも音声帯域においては
原理的にモノラルになってしまう。これについては一旦
モノラルになった信号を擬似ステレオ処理によって広が
りを与える等の工夫（特開平７−６４５７７号）もある
が、本来のソースのステレオ感が損なわれるという問題
があった。However, in such LR processing in the time domain, even if the input audio signal is a stereo signal, the output signal is monaural in principle at least in the audio band. For this, there is a technique (Japanese Patent Laid-Open No. 7-64577) in which a monaural signal is expanded by pseudo-stereo processing, for example, but there is a problem that the original stereo feeling of the source is impaired.

【０００４】この発明は、このような問題点に鑑みなさ
れたもので、ステレオ音響信号に対してステレオ感を大
きく損なうことなく、例えば中央付近のように、特定位
置に定位した信号のうちの特定の成分のみを強調又は抑
圧（キャンセル）することができる音響信号処理方法及
び装置を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of such a problem, and does not greatly impair the sense of stereo for a stereo sound signal. It is an object of the present invention to provide an audio signal processing method and apparatus capable of emphasizing or suppressing (cancelling) only the component.

【０００５】[0005]

【課題を解決するための手段】この発明に係る音響信号
処理方法は、入力された複数チャネルの音響信号を各チ
ャネル毎に周波数分析して各チャネル毎に周波数領域の
音響信号を出力すると共に、前記入力された複数チャネ
ルの音響信号のうち強調又は抑圧したい特定成分を指定
し、前記各チャネルの周波数領域の音響信号のうち指定
された特定成分の領域についてのみ、その特定成分を構
成する複素スペクトルのチャネル間での演算処理を行う
ことにより特定成分を強調又は抑圧し、この特定成分が
強調又は抑圧された各チャネルの周波数領域の音響信号
を逆変換によってそれぞれ時間領域に戻して出力するよ
うにしたことを特徴とする。A sound signal processing method according to the present invention is characterized in that a frequency analysis is performed for an input sound signal of a plurality of channels for each channel, and a sound signal in a frequency domain is output for each channel. A complex component that specifies a specific component to be emphasized or suppressed in the input audio signals of the plurality of channels, and configures the specific component only in a specified component region of the audio signal in the frequency domain of each channel. A specific component is emphasized or suppressed by performing the arithmetic processing between the channels, and the acoustic signal in the frequency domain of each channel in which the specific component is emphasized or suppressed is returned to the time domain by the inverse transform so as to be output. It is characterized by having done.

【０００６】また、この発明に係る音響信号処理装置
は、入力された複数チャネルの音響信号を各チャネル毎
に周波数分析して各チャネル毎に周波数領域の音響信号
を出力する複数の周波数分析手段と、前記入力された音
響信号から強調又は抑圧すべき特定成分を指定する特定
成分指定手段と、この特定成分指定手段で指定された特
定成分の領域についてのみ、その特定成分を構成する複
素スペクトルのチャネル間での演算処理を行うことによ
り特定成分を強調又は抑圧する特定成分間演算手段と、
この特定成分間演算手段で前記特定成分が強調又は抑圧
された各チャネルの周波数領域の音響信号を逆変換によ
って時間領域にそれぞれ戻して出力する複数の逆変換手
段とを備えたことを特徴とする。Further, the audio signal processing apparatus according to the present invention comprises a plurality of frequency analysis means for frequency-analyzing the input audio signals of a plurality of channels for each channel and outputting an audio signal in the frequency domain for each channel. A specific component specifying means for specifying a specific component to be emphasized or suppressed from the input audio signal, and a channel of a complex spectrum constituting the specific component only in a region of the specific component specified by the specific component specifying means. An inter-specific-component operation unit that emphasizes or suppresses a specific component by performing an operation process between
A plurality of inverse transforming means for returning the acoustic signal in the frequency domain of each channel in which the specific component is emphasized or suppressed by the computing means between the specific components to the time domain by inverse transform and outputting the signal. .

【０００７】この発明によれば、周波数領域の各チャネ
ルの音響信号から、例えばピッチ等によって決まる特定
成分の音響信号のみが強調又は抑圧され、その成分以外
の音響信号については大きく変化することがないので、
本来のステレオ感が損なわれ難い。According to the present invention, only the audio signal of a specific component determined by, for example, the pitch or the like is emphasized or suppressed from the audio signal of each channel in the frequency domain, and the audio signals other than the component do not change significantly. So
The original stereo feeling is hardly lost.

【０００８】なお、本発明の典型的な実施形態において
は、前記入力された音響信号が調波構造を持つ場合、そ
のピッチによって定まる成分を前記強調又は抑圧の対象
とする特定成分として指定する。その際、前記ピッチが
予め設定された範囲のピッチである場合に限り、そのピ
ッチによって定まる成分を前記強調又は抑圧の対象とす
る特定成分として指定することにより、目的とする調波
構造の音響信号のみを強調又は抑圧することができ、同
時に演算対象を減らすことが出来る。また、前記ピッチ
によって定まる成分のうち、所定の周波数範囲に含まれ
る成分及び／又は所定のレベルよりも大きい成分のみを
特定成分として指定することによっても演算量を削減す
ることができる。In a typical embodiment of the present invention, when the input acoustic signal has a harmonic structure, a component determined by its pitch is designated as the specific component to be emphasized or suppressed. At this time, only when the pitch is a pitch in a preset range, by specifying a component determined by the pitch as a specific component to be emphasized or suppressed, an acoustic signal having a desired harmonic structure is obtained. Can be emphasized or suppressed, and the number of calculation targets can be reduced at the same time. Also, of the components determined by the pitch, the amount of calculation can also be reduced by designating only components included in a predetermined frequency range and / or components larger than a predetermined level as specific components.

【０００９】予め定位位置が分かっているか又は推定可
能な場合、その定位位置によってチャネル間の演算で、
各チャネルに割り当てる係数を決定することができ、こ
の場合、その他の位置に定位する音響信号は、演算処理
による影響を受け難い。このため、特定成分にたまたま
異なる位置に定位する信号が重なった場合の影響が少な
いという利点がある。なお、複数チャネルの音響信号の
定位位置が予め分かっていない場合、指定された特定成
分のチャネル間の音響信号の比から、定位位置を推定
し、その推定結果から特定成分同士の演算での係数を調
整するようにすれば良い。When the localization position is known or can be estimated in advance, the operation between channels is performed by the localization position,
A coefficient to be assigned to each channel can be determined. In this case, the acoustic signals localized at other positions are hardly affected by the arithmetic processing. For this reason, there is an advantage that the influence when a signal localized at a different position happens to overlap with the specific component is small. When the localization positions of the audio signals of a plurality of channels are not known in advance, the localization position is estimated from the ratio of the audio signals between the channels of the specified specific component, and the coefficient in the calculation between the specific components is estimated from the estimation result. Should be adjusted.

【００１０】[0010]

【発明の実施の形態】以下、図面を参照して、この発明
の好ましい実施の形態について説明する。図１は、この
発明の一実施形態に係る音響信号処理装置の構成を示す
ブロック図である。左右チャネルのステレオ音声入力信
号Ｓ_IL，Ｓ_IRは、それぞれ周波数分析部１０，２０に入
力されて周波数分析されると共に、特定成分指定部３０
に入力されて指示すべき特定成分が抽出される。周波数
分析部１０，２０での分析結果及び特定成分指定部３０
による指定指示は特定成分間演算部４０に入力され、こ
こで指示された特定成分についてのみ、周波数領域での
成分間の演算による強調／抑圧処理が実行される。その
演算結果は、それぞれ逆変換部５０，６０に入力されＬ
／Ｒそれぞれのチャネルについて時間領域の波形に戻さ
れ、ステレオ音声出力信号Ｓ_OL，Ｓ_ORとして出力され
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an audio signal processing device according to an embodiment of the present invention. The left and right channel stereo audio input signals S _IL and S _IR are input to frequency analysis units 10 and 20, respectively, where they are subjected to frequency analysis, and a specific component designation unit 30 is also provided.
The specific component to be input and designated is extracted. Analysis results of frequency analysis units 10 and 20 and specific component designation unit 30
Is input to the inter-specific-component calculation unit 40, and the emphasis / suppression processing by the calculation between the components in the frequency domain is executed only for the specified component specified here. The calculation results are input to the inverse converters 50 and 60, respectively, and
/ R is returned to a time domain waveform for each channel and output as stereo audio output signals S _OL and S _OR .

【００１１】ここで、周波数分析部１０は、音声入力信
号Ｓ_ILの時間波形を所定の長さの分析単位に時分割する
時分割部１１と、時分割された時間波形をハミング窓等
の所定の窓関数で窓かけ処理する窓かけ部１２と、窓か
けされた時間波形をＦＦＴ（高速フーリエ変換）処理に
よって周波数分析するＦＦＴ演算部１３とを備えて構成
されている。周波数分析部２０も、これと同様に時分割
部２１、窓かけ部２２及びＦＦＴ演算部２３を備えて構
成されている。Here, the frequency analysis section 10 includes a time division section 11 for time-dividing the time waveform of the audio input signal _SIL into analysis units of a predetermined length, and a time-division time waveform such as a Hamming window. And a FFT operation unit 13 that performs frequency analysis on the windowed time waveform by FFT (fast Fourier transform) processing. Similarly, the frequency analysis unit 20 includes a time division unit 21, a windowing unit 22, and an FFT operation unit 23.

【００１２】また、特定成分指定部３０は、音声入力信
号Ｓ_IL，Ｓ_IRを加算する加算器３１と、この加算器３１
で加算された信号からピッチを抽出するピッチ抽出部３
２と、ピッチ抽出部３２で抽出されたピッチから調波構
造に基づく成分の指定を行う調波成分指定部３３とを備
えて構成されている。The specific component specifying section 30 includes an adder 31 for adding the audio input signals S _IL and S _IR ,
Pitch extraction unit 3 for extracting a pitch from the signal added in step 3
2 and a harmonic component specifying unit 33 for specifying a component based on the harmonic structure from the pitch extracted by the pitch extracting unit 32.

【００１３】逆変換部５０は、特定成分間演算部４０で
の演算結果である周波数領域の信号を時間領域の信号に
変換する逆ＦＦＴ部５１と、分析単位をつなぎ合わせる
合成部５２とを備えて構成されている。逆変換部６０
も、これと同様に逆ＦＦＴ部６１及び合成部６２を備え
ている。The inverse transform section 50 includes an inverse FFT section 51 for converting a signal in the frequency domain, which is a result of the operation in the inter-specific-component arithmetic section 40, into a signal in the time domain, and a combining section 52 for connecting analysis units. It is configured. Inversion unit 60
Also includes an inverse FFT unit 61 and a combining unit 62.

【００１４】次にこのように構成された音響信号処理装
置の動作について説明する。なお、ここではステレオ音
声入力信号Ｓ_IL，Ｓ_IRのうち、中央に定位した調波構造
を有する音を抑圧する例について説明する。ステレオ音
声入力信号Ｓ_IL，Ｓ_IRは、それぞれ周波数分析部１０，
２０に入力されて周波数分析される。図２（ａ），
（ｂ）はＦＦＴによる周波数分析結果の一例を示す図で
あり、同図（ａ）は、周波数分析部１０から出力される
Ｌチャネルの周波数分析結果を示す図、同図（ｂ）は、
周波数分析部２０から出力されるＲチャネルの周波数分
析結果を示す図である。なお、周波数分析の手法はＦＦ
Ｔに限らないが、ＦＦＴの場合、実際には複素スペクト
ルとなる。この例では、説明の簡単のため、ステレオ音
声入力信号Ｓ_IL，Ｓ_IRに２つの調波構造を持つ音だけが
含まれていることを示しており、一方の成分は中央に定
位し、残りの成分は左よりに定位している例を示してい
る。しかし、調波構造を持たない成分が含まれていても
勿論構わない。Next, the operation of the audio signal processing apparatus thus configured will be described. Here, an example will be described in which a sound having a harmonic structure localized at the center of the stereo audio input signals S _IL and S _IR is suppressed. The stereo audio input signals S _IL and S _IR are respectively _supplied to the frequency analysis unit 10 and
It is input to 20 and subjected to frequency analysis. FIG. 2 (a),
FIG. 2B is a diagram illustrating an example of a frequency analysis result by FFT, FIG. 2A is a diagram illustrating an L-channel frequency analysis result output from the frequency analysis unit 10, and FIG.
FIG. 9 is a diagram illustrating a frequency analysis result of an R channel output from the frequency analysis unit 20. The frequency analysis method is FF
Although not limited to T, in the case of FFT, it is actually a complex spectrum. In this example, for the sake of simplicity, it is shown that the stereo audio input signals S _IL and S _IR include only sounds having two harmonic structures, one of which is located at the center and the other is located at the center. Shows an example in which the components are localized from the left. However, of course, a component having no harmonic structure may be included.

【００１５】一方、特定成分指定部３０では、調波構造
に基づく特定成分を図３に示す処理により指定する。ま
ず、前回の特定成分の指定情報をクリアし（Ｓ１）、新
たに入力された分析単位の音声入力信号からピッチ周波
数Ｐを抽出する（Ｓ２）。即ち、ピッチ抽出のためにＬ
チャネル及びＲチャネルの信号を加算器３１により時間
領域で加算して、ピッチ抽出部３２でその周波数を分析
する。図２（ｃ）は、Ｌ＋Ｒの周波数分析結果を示す図
である。この例では、センター定位の成分が強調された
ような周波数分析結果となる。ピッチ抽出の手法として
は、例えばモノラル信号に対して最も優勢な信号をピッ
チ抽出する方法（例えば特開平５−１９７９３号等）を
使用することができる。この方法は、逆フィルタによっ
て予測残差信号を得、設定された数だけそれぞれスペク
トル積に基づくピッチ周波数の候補を抽出し、最大頻度
の周波数をもってピッチ周波数と決定する方法である。
また、例えば所定の基音レベル及び／又は倍音レベルを
超える周波数成分を全てピッチ周波数とするような複数
のピッチを検出する方法（例えば特開平６−２０２６２
７号等）を適用すれば、複数のピッチに関して同様の処
理を行うことができる。On the other hand, the specific component specifying section 30 specifies a specific component based on the harmonic structure by the processing shown in FIG. First, the specification information of the previous specific component is cleared (S1), and the pitch frequency P is extracted from the newly input voice input signal of the analysis unit (S2). That is, L is used for pitch extraction.
The signals of the channel and the R channel are added in the time domain by the adder 31, and the frequency is analyzed by the pitch extraction unit 32. FIG. 2C is a diagram illustrating a result of frequency analysis of L + R. In this example, the result of the frequency analysis is such that the component of the center localization is emphasized. As a method of pitch extraction, for example, a method of extracting a pitch of a signal that is most dominant to a monaural signal (for example, Japanese Patent Laid-Open No. 5-19793) can be used. In this method, a prediction residual signal is obtained by an inverse filter, pitch frequency candidates based on spectral products are extracted by a set number, and a pitch frequency is determined based on a frequency having the maximum frequency.
Further, for example, a method of detecting a plurality of pitches in which all frequency components exceeding a predetermined fundamental tone level and / or harmonic level are set as pitch frequencies (for example, Japanese Patent Laid-Open No. 6-20262).
No. 7), the same processing can be performed for a plurality of pitches.

【００１６】ピッチが抽出されると、その調波構造か
ら、各倍音位置は基本周波数の整数倍の位置と推定でき
るので、図２（ｄ）のように、基本周波数及び各倍音位
置の領域（特定成分）を設定することができる。実際に
は対象とする音（例えばボーカル）が入っていない場合
等でピッチが抽出されない場合（Ｓ３）には、領域の設
定はされないため、以後の特定成分間の演算処理も行う
必要がない（Ｓ１２）。また、ピッチがある特定の範囲
に入った場合のみ処理を行うことによって、特定成分
（例えば音声）以外の成分を除去するのを防止すると同
時に、処理の対象を限定して演算量を減らすようにして
も良い（Ｓ４）。複数のピッチが検出できる場合では、
各倍音のレベルやその時間変化を考慮して処理を行う対
象を限定しても良い。When the pitch is extracted, each harmonic position can be estimated as a position of an integral multiple of the fundamental frequency from its harmonic structure. Therefore, as shown in FIG. 2D, the region of the fundamental frequency and each harmonic position ( Specific component) can be set. In the case where the pitch is not extracted because the target sound (for example, vocal) does not actually exist (S3), since the area is not set, it is not necessary to perform the subsequent arithmetic processing between the specific components ( S12). Also, by performing processing only when the pitch is within a certain range, it is possible to prevent components other than a specific component (for example, audio) from being removed, and at the same time, reduce the amount of calculation by limiting the processing target. (S4). If multiple pitches can be detected,
The processing target may be limited in consideration of the level of each overtone and its time change.

【００１７】また、各成分の周波数方向の領域の幅は、
周波数分析部１０，２０の窓かけ部１２，２２で用いた
窓関数によって決まるメインローブ幅を考慮して決定す
ることができる。何倍音まで処理するかについては、対
象についての事前の情報に基づいて、また実際に抽出し
たピッチ情報、各倍音の周波数、実際のレベル等を考慮
して、どこで打ち切るかを判定すれば良い。例えば図３
の処理では、倍音数の上限値と周波数の上限値及び下限
値とを予め決めておき、ｉを倍音数のパラメータとして
（Ｓ５，Ｓ６）、ｆ＝ｉ×ｐを計算し（Ｓ８）、ｉが上
限値を超えるまで（Ｓ７）、又は周波数ｆが予め設定さ
れた下限値以上になってから（Ｓ９）、予め定めた上限
値を超えるまで（Ｓ１０）、基本周波数又は倍音周波数
ｆで指定された位置周辺を特定成分として指定する（Ｓ
１１）。ｉが予め設定された上限値を超えたら、又はｆ
が予め設定された上限値を超えたら、特定成分の指定は
終了する（Ｓ１３）。The width of each component in the frequency direction is as follows:
The determination can be made in consideration of the main lobe width determined by the window function used in the windowing units 12 and 22 of the frequency analysis units 10 and 20. The number of overtones to be processed may be determined based on prior information about the target and in consideration of actually extracted pitch information, the frequency of each overtone, the actual level, and the like. For example, FIG.
In the processing of (1), the upper limit value of the number of harmonics and the upper limit value and the lower limit value of the frequency are determined in advance, i is used as a parameter of the number of harmonics (S5, S6), and f = i × p is calculated (S8). Until the frequency exceeds the upper limit (S7), or after the frequency f becomes equal to or higher than the predetermined lower limit (S9), and exceeds the predetermined upper limit (S10). Around the position specified as a specific component (S
11). If i exceeds a preset upper limit, or f
When the value exceeds the preset upper limit, the designation of the specific component ends (S13).

【００１８】特定成分間演算部４０は、このようにして
設定された領域（図２（ｄ）に示す領域）に関してだ
け、図２（ａ），（ｂ）で概念的に示されたようなスペ
クトル間での演算処理を行う。周波数領域で特定成分間
演算を行うということは、ＦＦＴを用いる場合、次のよ
うに記述することができる。即ち、Ｌチャネルの第ｉ成
分の複素スペクトルをＸL［ｉ］，Ｒチャネルの第ｉ成
分の複素スペクトルをＸR［ｉ］とし、演算結果の各チ
ャネルの複素スペクトルをそれぞれＹL［ｉ］，ＹR
［ｉ］とすると、特定成分に含まれる成分については、The inter-specific-component operation unit 40 performs the processing only on the area set in this manner (the area shown in FIG. 2D) as conceptually shown in FIGS. 2A and 2B. Performs arithmetic processing between spectra. Performing the operation between specific components in the frequency domain can be described as follows when FFT is used. That is, the complex spectrum of the i-th component of the L channel is XL [i], the complex spectrum of the i-th component of the R channel is XL [i], and the complex spectra of the respective channels of the operation result are YL [i] and YL, respectively.
If [i], the components contained in the specific components are as follows:

【００１９】[0019]

【数１】ＹL［ｉ］＝αLL×ＸL［ｉ］＋αRL×ＸR
［ｉ］ＹR［ｉ］＝αLR×ＸL［ｉ］＋αRR×ＸR［ｉ］## EQU1 ## YL [i] = αLL × XL [i] + αRL × XR
[I] YR [i] = αLR × XL [i] + αRR × XR [i]

【００２０】とし、また、特定成分に含まれない成分に
ついては、In addition, for components not included in the specific components,

【００２１】[0021]

【数２】ＹL［ｉ］＝β×ＸL［ｉ］ＹR［ｉ］＝β×ＸR［ｉ］## EQU2 ## YL [i] = β × XL [i] YL [i] = β × XR [i]

【００２２】とする。但し、αLL，αRL，αLR，αRR，
βは、抑圧／強調の種別や定位情報によって決定される
係数である。βは強調においてクリッピングを避けるた
めに全体のレベル調整をする場合に使用することができ
る。抑圧の場合β＝１．０でも良い。係数の組（αLL，
αRL，αLR，αRR，β）は、次のように決定することが
できる。It is assumed that However, αLL, αRL, αLR, αRR,
β is a coefficient determined by the type of suppression / emphasis and localization information. β can be used to make overall level adjustments to avoid clipping in enhancement. In the case of suppression, β may be 1.0. A set of coefficients (αLL,
αRL, αLR, αRR, β) can be determined as follows.

【００２３】[0023]

【数３】中央定位の特定成分を強調する場合（0.5，0.5，0.5，0.5，0.5）中央定位の特定成分をＬ−Ｒで抑圧する場合（1.0，−1.0，1.0，−1.0，1.0）中央定位の特定成分をＲ−Ｌで抑圧する場合（−1.0，1.0，−1.0，1.0，1.0）を混合する場合（1.0，−1.0，−1.0，1.0，1.0）又は（−1.0，1.0，1.0，−1.0，1.0）## EQU3 ## When a specific component of the central location is emphasized (0.5, 0.5, 0.5, 0.5, 0.5) A specific component of the central location is suppressed by LR (1.0, -1.0, 1.0, -1.0, 1.0) When the specific component of the central location is suppressed by RL (-1.0, 1.0, -1.0, 1.0, 1.0) When mixed (1.0, -1.0, -1.0, 1.0, 1.0) or (-1.0, 1.0, 1.0) 1.0, −1.0, 1.0)

【００２４】抑圧において、定位が正確には中央でない
場合で、事前に定位が分かっている場合は、その情報に
応じて各係数の値を調整すればよい。例として、Ｌ：Ｒ
＝７：８でＲよりに定位していることが分かっている場
合、上記係数の組として、In the suppression, when the localization is not exactly at the center, and when the localization is known in advance, the value of each coefficient may be adjusted according to the information. For example, L: R
= 7: 8, it is known that the sound is localized from R.

【００２５】[0025]

【数４】αLL／αRL＝αLR／αRR＝−８／７ β＝１．０ΑLL / αRL = αLR / αRR = −8 / 7 β = 1.0

【００２６】となるような値を設定すればよい。更に、
定位が事前には知らされない場合、実際のデータに含ま
れる特定対象成分のＬ／Ｒに含まれる割合によって、そ
の定位情報を計算し、それに応じて、α値の組を事前に
分かっている場合と同様に設定しても良い。実際には、
特定成分として指定された領域ごとに、Ｌ／Ｒそれぞれ
について、スペクトルのピークの振幅を求め、調波成分
の場合は、例えば主要な１〜ｎ倍音について、その振幅
を求め、平均するだけで、定位の推定は可能である。ｎ
は対象とする音源の情報に基づいて決定すれば良い。It is sufficient to set a value such that Furthermore,
When the localization is not notified in advance, when the localization information is calculated based on the ratio included in the L / R of the specific target component included in the actual data, and the set of α values is known in advance accordingly. It may be set in the same manner as described above. actually,
For each region designated as a specific component, the amplitude of the peak of the spectrum is determined for each of the L / R, and in the case of the harmonic component, for example, the amplitudes of, for example, the main 1 to n harmonics are determined and averaged. Localization estimation is possible. n
May be determined based on information on the target sound source.

【００２７】いま、図２の例において、数３の、又
はの係数を使用すると、センターに定位している対象
成分は相殺され、左寄りに定位している成分は相殺され
ない。なお、この例の場合、左よりの調波構造を持つ音
のうち、第１０倍音が、たまたま設定された特定成分の
第７倍音の領域に入っている。しかし、この成分は中央
定位でないため、Ｌ−Ｒの結果は０にはならない。この
ため、その成分は演算の影響は受けるが相殺はされな
い。そういう点で、相殺した結果が０になるということ
と、設定された領域を０にしてしまうということとは異
なっている。また、特定成分に指定された成分以外の成
分は、Ｌ／Ｒそれぞれで基本的にそのままの値とするた
め、このようにして特定成分間の演算を行った結果は、
Ｌについては同図（ｅ）、Ｒについては同図（ｆ）のよ
うになり、この場合、中央に定位していた優勢な成分だ
けが抑圧されていて、左よりに入っていた成分はほぼそ
のまま残っている。Now, in the example of FIG. 2, when the coefficient of or 3 is used, the target component localized at the center is canceled, and the component localized leftward is not canceled. In the case of this example, among the sounds having the harmonic structure from the left, the tenth harmonic is accidentally included in the region of the seventh harmonic of the set specific component. However, since this component is not centralized, the result of LR is not zero. Therefore, the components are affected by the calculation but are not offset. In such a point, it is different from the fact that the offset result becomes 0 and that the set area is made 0. In addition, since the components other than the component designated as the specific component have basically the same value in each of L / R, the result of performing the calculation between the specific components in this manner is as follows.
As shown in FIG. 8E for L, and FIG. 10F for R, only the dominant component localized in the center is suppressed, and the component located from the left is almost It remains as it is.

【００２８】このように特定成分の音声信号が強調又は
抑圧された各チャネルの周波数領域の音声信号は、逆変
換部５０，６０において時間領域の音声信号に逆変換さ
れ、特定成分のみが強調又は抑圧された音声出力信号Ｓ
_OL，Ｓ_ORを得ることができる。The audio signals in the frequency domain of each channel in which the audio signal of the specific component is emphasized or suppressed are inversely converted into audio signals in the time domain by the inverse converters 50 and 60, and only the specific component is emphasized or suppressed. Suppressed audio output signal S
_OL and S _OR can be obtained.

【００２９】[0029]

【発明の効果】以上述べたように、この発明によれば、
周波数領域の各チャネルの音響信号から、例えばピッチ
等によって決まる特定成分の音響信号のみが強調又は抑
圧され、その成分以外の音響信号については何ら変化す
ることがないので、本来のステレオ感が損なわれること
が少ないという効果を奏する。As described above, according to the present invention,
From the audio signal of each channel in the frequency domain, only the audio signal of a specific component determined by, for example, the pitch or the like is emphasized or suppressed, and the audio signal other than the component does not change at all, so that the original stereo feeling is impaired. The effect that there is little is produced.

[Brief description of the drawings]

【図１】この発明の一実施例に係る音響信号処理装置
のブロック図である。FIG. 1 is a block diagram of an audio signal processing device according to an embodiment of the present invention.

【図２】同装置の各部の処理結果を示す図である。FIG. 2 is a diagram showing a processing result of each unit of the apparatus.

【図３】同装置における特定成分の指定方法を示すフ
ローチャートである。FIG. 3 is a flowchart showing a method for specifying a specific component in the apparatus.

[Explanation of symbols]

１０，２０…周波数分析部、１１，２１…時分割部、１
２，２２…窓かけ部、１３，２３…ＦＦＴ演算部、３０
…特定成分指定部、３１…加算器、３２…ピッチ抽出
部、３３…調波成分指定部、４０…特定成分間演算部、
５０，６０…逆変換部、５１，６１…逆ＦＦＴ部、５
２，６２…合成部。10, 20 ... frequency analysis unit, 11, 21 ... time division unit, 1
2, 22: windowing unit, 13, 23: FFT operation unit, 30
... Specific component designator, 31 ... Adder, 32 ... Pitch extractor, 33 ... Harmonic component designator, 40 ... Calculator between specific components,
50, 60: inverse transform section, 51, 61: inverse FFT section, 5
2,62 ... Synthesis unit.

Claims

[Claims]

1. An input audio signal of a plurality of channels is subjected to frequency analysis for each channel to output an audio signal in a frequency domain for each channel, and emphasis or suppression of the input audio signal of the plurality of channels is performed. Specifying a specific component to be performed, and for only the specified specific component region of the audio signal in the frequency domain of each channel, the specific component is obtained by performing an arithmetic process between the channels of the complex spectrum constituting the specific component. A sound signal processing method characterized by emphasizing or suppressing the sound signal in the frequency domain of each channel in which the specific component has been emphasized or suppressed and returning the sound signal to the time domain by inverse transform.

2. The audio signal according to claim 1, wherein when the input audio signal has a harmonic structure, a component determined by the pitch is designated as the specific component to be emphasized or suppressed. Processing method.

3. The component according to claim 2, wherein a component determined by the pitch is designated as the specific component to be emphasized or suppressed only when the pitch is within a predetermined range. Sound signal processing device.

4. A component determined by the pitch,
4. The acoustic signal processing method according to claim 2, wherein only components included in a predetermined frequency range and / or components larger than a predetermined level are designated as specific components.

5. When the localization positions of the audio signals of the plurality of channels are not known in advance, the localization positions are estimated from the ratio of the audio signals between the channels of the specified specific component, and the specific components are determined based on the estimation result. 5. A coefficient according to claim 1, wherein the coefficient is adjusted.
The sound signal processing method according to any one of the preceding claims.

6. A plurality of frequency analysis means for frequency-analyzing an input audio signal of a plurality of channels for each channel and outputting an audio signal in a frequency domain for each channel; A specific component specifying means for specifying a specific component to be suppressed; and specifying only a region of the specific component specified by the specific component specifying means by performing an arithmetic processing between channels of a complex spectrum constituting the specific component. Calculating means for emphasizing or suppressing the component; and a plurality of means for returning the sound signal in the frequency domain of each channel in which the specific component has been emphasized or suppressed by the calculating means between the specific components to the time domain by inverse transform, and outputting the signal. An acoustic signal processing device comprising: an inverse conversion means.

7. An adder for adding the input audio signals of the plurality of channels, and a pitch extracting means for extracting a pitch based on a harmonic structure from the audio signals added by the adder. 7. A harmonic component specifying means for obtaining a harmonic component based on the pitch extracted by the pitch extracting means and designating the harmonic component as the specific component. Sound signal processing device.