JPH10228296A - Acoustic signal separation method - Google Patents
Acoustic signal separation methodInfo
- Publication number
- JPH10228296A JPH10228296A JP9031813A JP3181397A JPH10228296A JP H10228296 A JPH10228296 A JP H10228296A JP 9031813 A JP9031813 A JP 9031813A JP 3181397 A JP3181397 A JP 3181397A JP H10228296 A JPH10228296 A JP H10228296A
- Authority
- JP
- Japan
- Prior art keywords
- waveform
- candidate
- sound
- fundamental frequency
- input audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Electrophonic Musical Instruments (AREA)
Abstract
(57)【要約】
【課題】 多くの音源が存在し、かつこれらが多様に変
動する場合でも高い精度でこれら音源信号を分離可能と
する。
【解決手段】 入力音響信号をパワー変動に着目して音
の立上りを検出し、その立上りで入力音響信号を区分し
(13)、各区分入力音響信号zの基本周波数に対し、
ある範囲内に収まる基本周波数をもつ記憶波形を候補と
して、記憶手段14から選択し(15)、この各候補に
対してFIRフィルタ処理をした結果の和と、対応zと
の平均二乗誤差が最少になるFIRフィルタのインパル
ス応答hnを求め、このhn を用いて対応する候補に対
し、FIRフィルタ処理を行い、その結果のパワーが大
きい場合、その候補の音源がzに含まれているとする。
(57) [Summary] [PROBLEMS] To enable separation of these sound source signals with high accuracy even when there are many sound sources and these fluctuate variously. SOLUTION: A rising edge of a sound is detected by paying attention to power fluctuation of an input audio signal, and the input audio signal is classified at the rising edge (13).
A storage waveform having a fundamental frequency falling within a certain range is selected as a candidate from the storage means 14 (15), and the sum of the results of performing FIR filter processing on each candidate and the mean square error of the corresponding z are minimized. The impulse response h n of the FIR filter is obtained, and the corresponding candidate is subjected to FIR filter processing using this h n. If the resulting power is large, it is determined that the candidate sound source is included in z. I do.
Description
【0001】[0001]
【発明の属する技術分野】この発明は、複数の音源から
の音が混在している音響信号をもとに、この音響信号に
含まれる個々の音源の音を分離抽出する音響信号の分離
方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound signal separation method for separating and extracting sounds of individual sound sources included in a sound signal based on the sound signal in which sounds from a plurality of sound sources are mixed. .
【0002】[0002]
【従来の技術】従来、音響信号分離方法に関しては、く
し型フィルタなど特定の周波数帯域のみを通過させるフ
ィルタ装置によって音源の分離を図る方法が知られてい
る。しかし、この方法では、複数の音源がある周波数帯
域を共有した場合には適切な分離処理が行えないため
に、一般に数多くの音源が存在した場合に分離が難しい
という欠点があった。2. Description of the Related Art Conventionally, as for an acoustic signal separation method, there is known a method of separating sound sources by a filter device such as a comb filter that passes only a specific frequency band. However, this method has a drawback that it is generally difficult to separate when a large number of sound sources exist, since a proper separation process cannot be performed when a plurality of sound sources share a certain frequency band.
【0003】また、入力音響信号に対して周波数解析を
行った後、パワースペクトルの特徴に着目してクラスタ
リングの手法により音響信号を分離する方法が知られて
いる。しかし、この方法はボトムアップに処理が行われ
るため、雑音が混入した場合や数多くの音源が含まれて
いた場合には、適切に処理できないという欠点があっ
た。Further, a method is known in which after performing frequency analysis on an input audio signal, the audio signal is separated by a clustering technique by focusing on the characteristics of the power spectrum. However, since this method performs processing from the bottom up, there is a disadvantage that the processing cannot be properly performed when noise is mixed in or when a large number of sound sources are included.
【0004】また、音源のモデルをパワースペクトル等
の形で装置内に記憶しておき、入力音響信号に適合する
モデルを選択し照合することによって音響信号の分離を
行う方法が知られている。しかしながら、この方法で
は、モデルが固定的であるために、音源の多様性や変動
に対して対応できないという欠点があった。従って、上
記の各方法は、数多くの音源が存在し、それらの音源が
多様であり変動をもつ場合にあっては、十分な音響信号
分離処理が期待し難い。There is also known a method in which a model of a sound source is stored in a device in the form of a power spectrum or the like, and a model suitable for an input sound signal is selected and collated to separate sound signals. However, this method has a drawback that it is not possible to cope with the variety and fluctuation of sound sources because the model is fixed. Therefore, in each of the above methods, when a large number of sound sources are present and the sound sources are various and have fluctuations, it is difficult to expect sufficient sound signal separation processing.
【0005】[0005]
【発明が解決しようとする課題】この発明は、数多くの
音源が存在し、それらの音源が多様であり変動をもつ場
合であっても十分に分離することができ、つまり公知の
方法と比較して高い精度で音響信号を分離することがで
きる音響信号分離方法を提供することを目的としてい
る。SUMMARY OF THE INVENTION The present invention can sufficiently separate even a large number of sound sources, even if the sound sources are diverse and have fluctuations, that is, compared with known methods. It is an object of the present invention to provide an audio signal separation method capable of separating an audio signal with high accuracy.
【0006】[0006]
【課題を解決するための手段】この発明によれば、入力
音響信号を時間的に区分し、その区分入力音響信号に含
まれている可能性のある全ての波形、波形記憶手段に記
憶された記憶波形中から選択して候補波形を得、これら
各候補波形にフィルタ処理を施した結果の和と当該区分
入力音響信号波形との平均自乗誤差を最小にするように
前記フィルタ処理の係数を求め、この求めたフィルタ係
数のフィルタ処理を各候補波形に対して行う。According to the present invention, an input audio signal is temporally divided, and all the waveforms possibly contained in the divided input audio signal are stored in the waveform storage means. Candidate waveforms are selected from among the stored waveforms to obtain candidate waveforms, and the coefficients of the filter processing are determined so as to minimize the mean square error between the sum of the results of performing filter processing on each of these candidate waveforms and the divided input acoustic signal waveform. The filter processing of the obtained filter coefficient is performed on each candidate waveform.
【0007】[0007]
【発明の実施の形態】次に、この発明の実施形態につい
て図面を用いて説明する。図1は、この発明方法を適用
した音響信号分離装置の機能構成を示す。なお、以下の
説明はこの装置の一応用例として音楽の演奏を楽器ごと
の演奏に分離する場合を例にとって説明する。Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 shows a functional configuration of an acoustic signal separating apparatus to which the method of the present invention is applied. In the following description, as an application example of this apparatus, a case where music performance is separated into performances for each musical instrument will be described as an example.
【0008】この音響信号分離装置10は、入力端子1
1からの混合音の音響信号波形を入力とし出力端子12
から音源ごとの音響信号波形を出力する。入力音響信号
(波形)は例えば48kHz、96kHzなどでサンプ
リングされ、その各サンプルのデジタル値の時系列とし
て入力される。入力端子11からの音響信号はこれに含
まれる音の立上り成分が波形区分手段13で検出され、
その音響信号が時間的に区分される。この区分は一定時
間ごとの区分としてもよい。The acoustic signal separating apparatus 10 has an input terminal 1
The audio signal waveform of the mixed sound from 1 is input and output terminal 12
Output an acoustic signal waveform for each sound source. The input acoustic signal (waveform) is sampled at, for example, 48 kHz or 96 kHz, and is input as a time series of digital values of each sample. The rising component of the sound contained in the sound signal from the input terminal 11 is detected by the waveform classifying means 13,
The sound signal is temporally divided. This division may be made at regular intervals.
【0009】波形記憶手段14に、この装置10が対象
とする音源波形のテンプレートをあらかじめ記憶してあ
る。候補波形選択手段15で各区分ごとに、入力音響信
号波形に対し、基本周波数、パワー包絡など基本的な音
の特徴量を分析し、その結果を参照して、波形記憶手段
14に蓄えられている波形の中から、その入力音響信号
波形に含まれている可能性のある波形を選択する。[0009] A waveform storage means 14 stores in advance a template of a sound source waveform targeted by the apparatus 10. The candidate waveform selection means 15 analyzes the basic sound features such as the fundamental frequency and the power envelope for the input sound signal waveform for each section, and refers to the result, and stores it in the waveform storage means 14. From the existing waveforms, a waveform that may be included in the input acoustic signal waveform is selected.
【0010】これら選択された波形のそれぞれに対して
フィルタ演算を適用した各波形の和と、入力音響信号波
形との自乗平均誤差が最小となるようなフィルタ演算の
係数が係数決定手段16で決定される。この決定された
フィルタ演算の係数をフィルタ演算手段17に設定し
て、候補波形選択手段15で選択された波形のそれぞれ
に対してフィルタ演算を行う。その各フィルタ演算の結
果を、各音源ごとに分離された出力として出力端子12
に出力される。The coefficient determining means 16 determines the sum of the respective waveforms obtained by applying the filter operation to each of the selected waveforms and the coefficient of the filter operation that minimizes the root mean square error with the input acoustic signal waveform. Is done. The determined coefficients of the filter operation are set in the filter operation means 17, and the filter operation is performed on each of the waveforms selected by the candidate waveform selection means 15. The result of each filter operation is output to an output terminal 12 as an output separated for each sound source.
Is output to
【0011】次に、上述した手段13,15,16,1
7における各処理を以下に具体的に説明する。波形区分
手段13では、図2に示すようにまず入力音響信号を読
み込み(ステップ101)、その入力音響信号のパワー
変動等に着目して、その入力音響信号に含まれる音の立
上りを検出する(ステップ102)。次に、前回の検出
立上りから今回検出された立上り時刻までを区分音響信
号として入力音響信号を出力する(ステップ103)。
続いて入力音響信号が引続き入力されているかどうかを
調べ(ステップ104)、引続き入力されていればステ
ップ102以降の処理を繰り返し、入力が終了していれ
ば処理を終わる。Next, the above means 13, 15, 16, 1
Each process in 7 will be specifically described below. As shown in FIG. 2, the waveform classifying means 13 first reads an input audio signal (step 101), and detects a rising edge of a sound included in the input audio signal by paying attention to a power fluctuation or the like of the input audio signal (step 101). Step 102). Next, an input audio signal is output as a segmented audio signal from the last detected rise to the currently detected rise time (step 103).
Subsequently, it is checked whether or not the input audio signal is continuously input (step 104). If the input audio signal is continuously input, the processing after step 102 is repeated, and if the input is completed, the processing is ended.
【0012】候補波形選択手段15では、図3に示すよ
うにまず波形区分手段13で区分された区分入力音響信
号を読み込む(ステップ201)。次に、その各区分入
力音響信号に対して周波数成分を抽出し(ステップ20
2)、基本周波数およびパワー包絡等の音の特徴量を抽
出する(ステップ203)。この特徴量は、その区分入
力音響信号に含まれている可能性のある音の記憶波形を
選択するために用いられる。音の記憶波形は、波形記憶
手段14にあらかじめ蓄積されているので、これを順に
検査する(ステップ204〜208)。まず、未検査の
記憶波形があるかどうかを調べ(ステップ204)、も
しあれば未検査の記憶波形を一つ選択する(ステップ2
05)。次に、その記憶波形の基本周波数と、ステップ
202で抽出された周波数成分の周波数とを比較し、あ
る範囲内に収まっているかどうかを調べる(ステップ2
06)。もしある範囲に収まっていなければ、その記憶
波形は当該区分入力音響信号に含まれている可能性は低
いので、ステップ204に戻る。前記ある範囲は例えば
次のようにして決める。即ち記憶波形の基本周波数をそ
の大きさ順に並べた場合、ある基本周波数についてみる
と、そのすぐ下の基本周波数との間の半分だけ低い周波
数から、すぐ上の基本周波数との間の半分だけ高い周波
数までの範囲に入るものを候補とする。例えば半音ごと
の記憶波形を設ける場合は、半音は約6%ずつ周波数が
高くなっているから、基本周波数±3%の範囲にあるも
のを候補とする。ステップ206でもしある範囲に収ま
っていれば、さらに特徴量に矛盾(例えば発音不可能な
音域であるなど)があるかどうかを調べる(ステップ2
07)。もし矛盾があれば、その記憶波形は当該区分入
力音響信号に含まれている可能性は低いので、ステップ
204に戻る。もし矛盾がなければ、その記憶波形は当
該区分入力音響信号に含まれている可能性が高いので、
候補波形に追加して(ステップ208)ステップ204
に戻る。ステップ204において、未検査の記憶波形が
なければ、その時点までに見出された候補波形を出力し
て(ステップ209)終了する。As shown in FIG. 3, the candidate waveform selecting means 15 first reads the segmented input sound signals divided by the waveform dividing means 13 (step 201). Next, a frequency component is extracted from each of the divided input audio signals (step 20).
2) Extract sound features such as fundamental frequency and power envelope (step 203). This feature amount is used to select a stored waveform of a sound that may be included in the divided input sound signal. Since the stored waveform of the sound is stored in the waveform storage means 14 in advance, it is inspected sequentially (steps 204 to 208). First, it is checked whether there is an untested stored waveform (step 204), and if so, one untested stored waveform is selected (step 2).
05). Next, the fundamental frequency of the stored waveform is compared with the frequency of the frequency component extracted in step 202 to check whether the frequency falls within a certain range (step 2).
06). If it does not fall within a certain range, it is unlikely that the stored waveform is included in the divided input audio signal, and the process returns to step 204. The certain range is determined, for example, as follows. That is, if the fundamental frequencies of the stored waveforms are arranged in the order of their magnitudes, looking at a certain fundamental frequency, the frequency is lower by a half between the fundamental frequencies immediately below it, and is higher by a half between the fundamental frequency immediately above it. Those that fall within the frequency range are considered as candidates. For example, when a stored waveform is provided for each semitone, since the frequency of the semitone is increased by about 6%, those having a fundamental frequency within ± 3% are selected as candidates. If it falls within a certain range in step 206, it is further checked whether or not there is a contradiction in the characteristic amount (for example, a sound range that cannot be pronounced) (step 2).
07). If there is a contradiction, it is unlikely that the stored waveform is included in the segmented input sound signal, and the process returns to step 204. If there is no inconsistency, the stored waveform is likely to be included in the segmented input sound signal,
Add to the candidate waveform (step 208) and step 204
Return to In step 204, if there is no untested stored waveform, the candidate waveform found up to that point is output (step 209), and the process ends.
【0013】係数決定手段16では、まず波形区分手段
13で区分された区分入力音響信号を読み込む(ステッ
プ301)。次に、候補波形選択手段15で選択された
候補波形を読み込む(ステップ302)。続いて、各候
補波形にそれぞれフィルタ演算を適用した結果の各波形
を足し合わせた波形と、当該区分入力音響信号との平均
自乗誤差が最小となるようなフィルタ係数を求めるため
に、連立方程式を作成する(ステップ303)。フィル
タとしてFIR型を用いることにすれば、候補波形にフ
ィルタ演算を適用した結果の波形は yn (k)=Σm=0 M-1 hn (m)rn (k−m) (1) と書ける。ここで、kは標本化された時刻、nは候補波
形を数える添字、yn (k)はフィルタ演算を適用した
結果の時刻kの値、hはFIRフィルタのインパルス応
答、rは候補波形、Mはフィルタの次数である。各候補
波形にフィルタ演算を適用した結果の各波形を足し合わ
せた波形と、当該区分入力音響信号との平均自乗誤差は J=E〔{z(k)−Σn=0 N-1 yn (k)}2 〕 (2) と書ける。ここでz(k)は区分入力音響信号波形の時
刻kの値、Nは候補波形の数、Eは時間平均を表す。こ
れを最小化するための必要条件は、全てのnとmにに関
して、偏微分∂J/∂hn (m)が0となることであ
る。この条件を用いると、N×M個の連立一次方程式 Σn=0 N-1 Σm=0 M-1 E〔ri (k−l)rj (k−m)〕hn (m) =E〔ri (k−m)z(k)〕 (3) を導くことができる。方程式(3)をステップ303に
おいて作成する。続いて、方程式(3)を解く(ステッ
プ304)。方程式(3)は、未知数の個数と方程式の
個数が等しいので、係数行列の逆行列を求めることによ
って解くことができる。求められた係数をステップ30
5において出力する。The coefficient determining means 16 first reads the segmented input sound signals divided by the waveform dividing means 13 (step 301). Next, the candidate waveform selected by the candidate waveform selecting means 15 is read (step 302). Then, in order to obtain a filter coefficient that minimizes the mean square error between the waveform obtained by applying the filter operation to each candidate waveform and the divided input sound signal, a simultaneous equation is formed. It is created (step 303). If you decide to use a FIR type as a filter, the waveform of the result of applying a filter operation to the candidate waveform y n (k) = Σ m = 0 M-1 h n (m) r n (k-m) (1 ). Here, k is a sampled time, n is a suffix for counting candidate waveforms, y n (k) is a value of a time k resulting from applying a filter operation, h is an impulse response of an FIR filter, r is a candidate waveform, M is the order of the filter. A waveform obtained by adding the waveform of the result of applying a filter operation to each candidate waveform, the mean square error between the divided input sound signal J = E [{z (k) -Σ n = 0 N-1 y n (K)} 2 ] (2) Here, z (k) is the value of the time k of the segmented input sound signal waveform, N is the number of candidate waveforms, and E is the time average. A necessary condition for minimizing this is that the partial differential ∂J / ∂h n (m) becomes zero for all n and m. With this condition, N × M pieces of simultaneous linear equations Σ n = 0 N-1 Σ m = 0 M-1 E [r i (k-l) r j (k-m) ] h n (m) = E [r i (k-m) z (k) ] (3) can be derived. Equation (3) is created in step 303. Subsequently, equation (3) is solved (step 304). Since the number of unknowns is equal to the number of equations, equation (3) can be solved by finding the inverse of the coefficient matrix. The obtained coefficient is used in step 30.
Output at 5.
【0014】フィルタ演算手段17では、図5に示すよ
うにまず係数決定手段10で求められたフィルタ係数を
読み込み(ステップ401)、次に候補波形選択手段1
5で選択された候補波形を読み込む(ステップ40
2)。続いて式(1)のフィルタ演算を行い(FIR型
フィルタの場合)(ステップ403)、演算結果の波形
を出力する(ステップ404)。この波形が、音源ごと
に分離された信号波形である。The filter calculating means 17 first reads the filter coefficients obtained by the coefficient determining means 10 (step 401) as shown in FIG.
The candidate waveform selected in step 5 is read (step 40).
2). Subsequently, the filter operation of Expression (1) is performed (in the case of an FIR type filter) (step 403), and the waveform of the operation result is output (step 404). This waveform is a signal waveform separated for each sound source.
【0015】以上のように、この発明では記憶波形から
候補を選択し、これら各候補記憶波形をフィルタ処理し
たものの和と区分入力音響信号との二乗誤差が最小にな
るようにフィルタ係数を決定しているため、つまり区分
入力音響信号の特性に近いフィルタ係数が決定され、選
択された候補記憶波形中の区分入力音響信号中に含まれ
ないものは、そのフィルタを通しても通過しないような
フィルタ特性となる。また前記のようなフィルタ係数の
決定は、ある候補記憶波形が区分入力音響信号中に存在
する音源波形と近い場合は、この候補記憶波形と音源波
形との波形の変形に応じたフィルタ係数が決定され、そ
の候補記憶波形をフィルタ処理した場合に区分入力音響
信号中のその対応音源波形に対する波形変形が吸収さ
れ、大きな出力が得られる。As described above, in the present invention, candidates are selected from the stored waveforms, and the filter coefficients are determined so that the square error between the sum of the results obtained by filtering the candidate stored waveforms and the divided input sound signal is minimized. That is, a filter coefficient close to the characteristic of the segmented input sound signal is determined, and a filter characteristic that is not included in the segmented input sound signal in the selected candidate storage waveform does not pass even through the filter. Become. Further, when the candidate storage waveform is close to the sound source waveform present in the segmented input sound signal, the filter coefficient according to the deformation of the waveform between the candidate storage waveform and the sound source waveform is determined. Then, when the candidate storage waveform is filtered, the waveform deformation of the corresponding input sound waveform in the divided input sound signal is absorbed, and a large output is obtained.
【0016】例えばA社製ピアノとB社製ピアノで高さ
F4をほぼ同じ強さで弾いた場合の同じ時間部分(立上
がりから100ms〜130ms)の波形は図6A,B
に示すように、全体としては同様の波形であるが互いに
異なっている。図6Bの波形を40次のFIRフィルタ
で処理することにより、図6Cに示すように図6Aの波
形に可成り近づいたものとすることができる。For example, waveforms of the same time portion (100 ms to 130 ms from the rise) when the height F4 is played with almost the same strength on the piano manufactured by Company A and the piano manufactured by Company B are shown in FIGS. 6A and 6B.
As shown in the figure, the waveforms are similar as a whole, but different from each other. By processing the waveform of FIG. 6B with a 40th-order FIR filter, the waveform of FIG. 6C can be made to be considerably close to the waveform of FIG. 6A.
【0017】従って、各候補記憶波形についてそれぞれ
フィルタ演算手段17でフィルタ処理をすると、区分入
力音響信号中に含まれる音源波形と同一のものの出力平
均パワーが大となり、その平均パワーはその音源波形の
混合割合に応じた値となり、区分入力音響信号中に含ま
れていない候補記憶波形の出力はゼロとなり、かつ、候
補記憶波形が区分入力音響信号中の対応する音源波形に
対して波形が多少異なっていても、これが適応的に修正
され、フィルタ処理出力は大きなものとなる。Therefore, when each candidate storage waveform is subjected to the filtering process by the filter operation means 17, the output average power of the same sound source waveform included in the divided input sound signal becomes large, and the average power becomes the average power of the sound source waveform. A value corresponding to the mixing ratio, the output of the candidate storage waveform not included in the divided input audio signal is zero, and the candidate storage waveform is slightly different from the corresponding sound source waveform in the divided input audio signal. However, this is adaptively corrected, and the filtered output is large.
【0018】係数決定手段16における処理が有効にな
るためには、候補記憶波形rの基本周波数および位相
が、区分入力音響信号zに含まれている音源の基本周波
数および位相と一致していることが望ましい。これは係
数決定手段16でのフィルタでは信号の周波数を変える
ことができないからである。このため候補記憶波形rの
位相を、区分入力音響信号z中の対応する音源成分の位
相に時々刻々合わせ込む波形同期処理を行うとよい。こ
の波形同期処理は例えば次のように行う。In order for the processing by the coefficient determining means 16 to be effective, the fundamental frequency and phase of the candidate storage waveform r must match the fundamental frequency and phase of the sound source included in the segmented input sound signal z. Is desirable. This is because the filter of the coefficient determination means 16 cannot change the frequency of the signal. For this reason, it is preferable to perform a waveform synchronization process that momentarily matches the phase of the candidate storage waveform r with the phase of the corresponding sound source component in the divided input audio signal z. This waveform synchronization processing is performed, for example, as follows.
【0019】図7に示すように区分入力音響波形、すな
わち基準波形zを読み込んで(ステップ301)、帯域
フィルタバンクなどの方法で周波数解析を行う(ステッ
プ302)。次に、その周波数解析によって時間周波数
平面上でのパワー表現が得られるので、周波数方向でパ
ワーの極大点(ローカルピーク)を見出す(ステップ3
03)。続いて、時間的に連続するローカルピークを接
続して、一続きのローカルピークとする(ステップ30
4)。ローカルピークを時間的に接続したものは、周波
数成分と呼ばれ、もとの波形に存在する色々な周期性を
表現したものである。この周波数成分を周期性情報とし
て出力する(ステップ305)。候補記憶波形rについ
ても同様に処理して周期性情報を取得する。As shown in FIG. 7, a divided input acoustic waveform, that is, a reference waveform z is read (step 301), and frequency analysis is performed by a method such as a band filter bank (step 302). Next, since the power analysis on the time-frequency plane is obtained by the frequency analysis, a maximum point (local peak) of the power is found in the frequency direction (step 3).
03). Subsequently, the temporally continuous local peaks are connected to form a continuous local peak (step 30).
4). The connection of the local peaks in time is called a frequency component, and expresses various periodicities existing in the original waveform. This frequency component is output as periodicity information (step 305). The same processing is performed on the candidate storage waveform r to obtain the periodicity information.
【0020】次に図8に示すように区分入力音響信号波
形zと候補記憶波形rに存在する周期性の中で、ほぼ同
一の基本周波数を選択し、その周波数にバンドパスフィ
ルタの中心周波数を設定する。区分入力音響波形zにこ
のバンドパスフィルタを適用して出力波形を得て(ステ
ップ402)、この出力波形の位相の時系列を記憶する
(ステップ403)。つまりバンドパスフィルタの出力
は正弦波に近いので、その正弦波時系列の符号反点の前
後の時刻k、k+1のサンプル値からゼロクロス時刻を
求め、更に正弦波の周期を求め、各正弦波時系列の各時
刻での位相値(位相角)を求める。続いて候補記憶波形
にもステップ402で用いたものと同じバンドパスフィ
ルタを適用して出力波形を得て(ステップ404)、こ
の出力波形の位相の時系列を記憶する(ステップ40
5)。次に、ステップ403とステップ405とで記憶
した両位相の時系列の差を求めて、位相差の時系列を得
て出力する(ステップ406)。Next, as shown in FIG. 8, substantially the same fundamental frequency is selected from among the periodicities existing in the divided input acoustic signal waveform z and the candidate storage waveform r, and the center frequency of the band-pass filter is set to that frequency. Set. An output waveform is obtained by applying the band-pass filter to the sectioned input acoustic waveform z (step 402), and a time series of the phase of the output waveform is stored (step 403). That is, since the output of the bandpass filter is close to a sine wave, the zero-crossing time is obtained from the sample values at times k and k + 1 before and after the sign inversion of the sine wave time series, and the cycle of the sine wave is obtained. A phase value (phase angle) at each time of the sequence is obtained. Subsequently, the same band-pass filter as used in step 402 is applied to the candidate storage waveform to obtain an output waveform (step 404), and the time series of the phase of the output waveform is stored (step 40).
5). Next, the time series difference between the two phases stored in step 403 and step 405 is obtained, and the time series of the phase difference is obtained and output (step 406).
【0021】この位相差の時系列を、時間差の時系列に
換算する。この換算は、式(4)によって行う。 δt(k)=(1/(2πf))δp(k) (4) ただし、kは時刻、δt(k)は時間差時刻kの時間
差、fはバンドパスフィルタの中心周波数、δp(k)
は位相差時系列の時刻kの位相差である。The time series of the phase difference is converted into the time series of the time difference. This conversion is performed by equation (4). δt (k) = (1 / (2πf)) δp (k) (4) where k is the time, δt (k) is the time difference of the time difference k, f is the center frequency of the bandpass filter, δp (k)
Is the phase difference at time k in the phase difference time series.
【0022】時間差時系列の各時間差に応じて候補記憶
波形時系列rの対応サンプル値を遅らせ又は進める。こ
の結果、区分入力音響信号zの基本周波数と瞬時位相同
期した候補記憶波形が得られる。The corresponding sample value of the candidate stored waveform time series r is delayed or advanced according to each time difference of the time difference time series. As a result, a candidate storage waveform that is instantaneously synchronized in phase with the fundamental frequency of the divided input sound signal z is obtained.
【0023】[0023]
【発明の効果】次にこの発明を適用した認識精度を評価
する実験について述べる。図9に示すように3つの単音
が同時に鳴るパターンをテストパターンとし、パターン
はクラス2、つまり同時に発音する単音の少なくとも一
組が1.5の整数倍の関係にある基本周波数を持つよう
な単音パターンとした。パターンの作成においては、あ
らかじめフルート、ピアノ、およびバイオリンの自然楽
器の単音を半音ごとにスタジオで収録し(16bit、
48kHz)、この波形を計算機上に蓄積しておき、こ
れをクラス2およびMIDIノート番号60〜74とい
う制約の中でランダムに選択して加算することによって
パターンを作成した。Next, an experiment for evaluating recognition accuracy to which the present invention is applied will be described. As shown in FIG. 9, a test pattern is a pattern in which three single sounds sound simultaneously, and the pattern is class 2, that is, a single sound in which at least one set of simultaneously sounding single sounds has a fundamental frequency that is an integer multiple of 1.5. Pattern. In creating a pattern, a single note of a flute, piano, and violin natural musical instrument is recorded in a studio for each semitone in advance (16 bits,
48 kHz), the waveform was stored on a computer, and the waveform was randomly selected and added under the restrictions of class 2 and MIDI note numbers 60 to 74 to create a pattern.
【0024】認識率Rの定義は次式(5)によった。 R=100・{((right-wrong)/total) ・(1/2) +1/2 } (5) rightは出力に含まれて音符のうち音高と音色の両
方が正しく認識された音符の数、wrongは出力に含
まれる音符のうち、音高と音色のどちらかまたは両方が
正しくない音符の数、totalは入力(正解)に含ま
れる総音符数である。予備実験の結果からテンプレート
フィルタリングONの条件においては、FIRフィルタ
の次数を40とした。なおテンプレートフィルタリング
は式(1)のフィルタリングのことであり、テンプレー
トフィルタリングOFFとはFIRフィルタの次数を1
としたという意味である。The definition of the recognition rate R is based on the following equation (5). R = 100 · {((right-wrong) / total) · (1/2) +1/2} (5) The right is included in the output and the pitch of the note in which both the pitch and the tone are correctly recognized. The number, wrong, is the number of notes whose pitch and / or timbre are not correct among the notes included in the output, and total is the total number of notes included in the input (correct answer). From the results of the preliminary experiment, the order of the FIR filter was set to 40 under the condition of template filtering ON. Note that template filtering refers to filtering of equation (1), and template filtering OFF means that the order of the FIR filter is 1
It means that it was done.
【0025】この実験では、原テンプレートとしてテス
トパターンの生成に利用するのと同一の波形を用いた
り、同一個体の楽器を用いたりすると、波形の一致度が
高いために評価実験としては適切でない。そこで、テン
プレートの波形とテストパターンの波形は、互いに異な
る個体から収録したものを用いた。これを図10に示
す。In this experiment, using the same waveform as that used for generating the test pattern as the original template or using the same musical instrument as the original template is not appropriate as an evaluation experiment because of the high degree of matching of the waveforms. Therefore, the waveform of the template and the waveform of the test pattern used were recorded from different individuals. This is shown in FIG.
【0026】実験結果を図11に示す。この表では、右
下の欄の条件(テンプレートフィルタリングOff、位
相同期Off)が、単純なマッチドフィルタによる音源
同定に相当している。したがって、マッチドフィルタに
比較して、この発明の適応型テンプレートを用いる処理
の有効性が明確に示されていると見ることができる。特
に位相同期を行うと一層認識率が高くなっている。FIG. 11 shows the experimental results. In this table, the conditions (template filtering Off, phase synchronization Off) in the lower right column correspond to sound source identification using a simple matched filter. Therefore, it can be seen that the effectiveness of the processing using the adaptive template of the present invention is clearly shown in comparison with the matched filter. In particular, when phase synchronization is performed, the recognition rate is further increased.
【0027】ベンチマークテストに加え、音楽の生演奏
を対象とした音楽認識テストを行った。ここでは、図1
0とはまた別の楽器個体のバイオリン、フルート、ピア
ノを用いて演奏したテスト曲「蛍の光」を対象として、
音源同定処理についての認識率Rを調べた。図12にそ
の結果を示す。図中の値は音源同定処理だけに関する認
識率である。結果の定性的傾向はベンチマークテストと
同様であり、この発明の方法の有効性が示されている。In addition to the benchmark test, a music recognition test was performed on live music. Here, FIG.
For the test song "Firefly Light", which was performed using a violin, flute, and piano of another musical instrument,
The recognition rate R for the sound source identification processing was examined. FIG. 12 shows the result. The values in the figure are the recognition rates for only the sound source identification processing. The qualitative trend of the results is similar to the benchmark test, indicating the effectiveness of the method of the present invention.
【0028】以上、説明したように、この発明によれ
ば、数多くの音源が存在し、それらの音源が多様であり
変動をもつ場合であっても、公知の方法に比較して高い
精度で音響信号分離処理を行うことができるという利点
がある。As described above, according to the present invention, even if there are a large number of sound sources, and these sound sources are diverse and have fluctuations, the sound can be obtained with higher accuracy than a known method. There is an advantage that signal separation processing can be performed.
【図1】この発明方法を適用した音響信号分離装置の機
能構成例を示すブロック図。FIG. 1 is a block diagram showing a functional configuration example of an acoustic signal separation device to which the method of the present invention is applied.
【図2】波形区分手段13の処理手順を示す流れ図。FIG. 2 is a flowchart showing a processing procedure of a waveform classification unit 13;
【図3】候補波形選択手段15の処理手順を示す流れ
図。FIG. 3 is a flowchart showing a processing procedure of a candidate waveform selection unit 15;
【図4】係数決定手段16の処理手順を示す流れ図。FIG. 4 is a flowchart showing a processing procedure of a coefficient determining unit 16;
【図5】フィルタ演算手段17の処理手順を示す流れ
図。FIG. 5 is a flowchart showing a processing procedure of a filter operation means 17;
【図6】候補波形と区分入力音響信号波形と、適応化フ
ィルタ処理後の区分入力音響信号波形との例を示す図。FIG. 6 is a diagram showing an example of a candidate waveform, a segmented input sound signal waveform, and a segmented input sound signal waveform after adaptive filter processing.
【図7】波形の同期性情報を取得する手順を示す流れ
図。FIG. 7 is a flowchart showing a procedure for acquiring waveform synchronization information.
【図8】位相差時系列を取得する手順を示す流れ図。FIG. 8 is a flowchart showing a procedure for acquiring a phase difference time series.
【図9】ベンチマークテストに用いた単音パターンの例
を示す図。FIG. 9 is a diagram showing an example of a single sound pattern used in a benchmark test.
【図10】実験に用いた楽器を示す図。FIG. 10 is a diagram showing a musical instrument used in the experiment.
【図11】ベンチマークテストの結果を示す図。FIG. 11 is a view showing a result of a benchmark test.
【図12】音響認識テストの結果を示す図。FIG. 12 is a diagram showing a result of a sound recognition test.
Claims (5)
と、 上記区分された各区分入力音響信号ごとにこれに含まれ
ている可能性のあるすべての波形を、波形記憶手段に記
憶されている記憶波形から選択して候補波形を得る過程
と、 上記各候補波形にフィルタ演算した結果の和と上記区分
入力音響信号との誤差が最小になるように、上記フィル
タ演算のフィルタ係数を求める過程と、 上記各候補波形に対し、上記求めたフィルタ係数をもつ
フィルタ演算を行う過程とを有する音響信号分離方法。1. A process of temporally dividing an input audio signal, and all waveforms possibly contained in each of the divided input audio signals are stored in a waveform storage means. Obtaining candidate waveforms by selecting from the stored waveforms, and obtaining filter coefficients of the filter operations so that an error between the sum of the results of the filter operations performed on the candidate waveforms and the divided input sound signal is minimized. And performing a filter operation with the obtained filter coefficient on each of the candidate waveforms.
力音響信号の基本周波数を抽出する過程と、その抽出さ
れた基本周波数に対し、所定範囲に収まる基本周波数を
もつ記憶波形を上記波形記憶手段から選択する過程を有
することを特徴とする請求項1記載の音響信号分離方
法。2. The step of obtaining the candidate waveform includes the step of extracting a fundamental frequency of the divided input audio signal, and the step of storing a stored waveform having a fundamental frequency falling within a predetermined range with respect to the extracted fundamental frequency. 2. The method according to claim 1, further comprising the step of selecting from the means.
含まれる音の立上りを検出し、隣接する検出立上りの間
を区分とする過程であることを特徴とする請求項1又は
2記載の音響信号分離方法。3. The sound according to claim 1, wherein the step of dividing is a step of detecting a rising edge of a sound included in the input audio signal and dividing the interval between adjacent detected rising edges. Signal separation method.
記区分入力音響信号の基本周波数成分にそれぞれ位相同
期させ、その位相同期した候補波形を上記フィルタ係数
を求める過程に用いることを特徴とする請求項1乃至3
の何れかに記載の音響信号分離方法。4. The method according to claim 1, wherein a fundamental frequency component of each of the candidate waveforms is phase-synchronized with a fundamental frequency component of the divided input audio signal, and the phase-synchronized candidate waveform is used in a process of obtaining the filter coefficient. Claims 1 to 3
The acoustic signal separation method according to any one of the above.
数成分と上記区分入力音響信号の基本周波数成分との位
相差の時系列を取得し、その位相差時系列の各位相差を
極性を含む時間差に変換した時間移動量の時系列を求
め、この時間移動量の時系列の各時間移動量に応じて、
対応する候補波形の対応する時刻のサンプルを移動させ
ることにより行うことを特徴とする請求項4記載の音響
信号分離方法。5. The phase synchronization acquires a time series of a phase difference between a fundamental frequency component of each candidate waveform and a fundamental frequency component of the divided input audio signal, and includes a polarity of each phase difference of the phase difference time series. The time series of the time movement amount converted to the time difference is obtained, and according to each time movement amount of this time movement time series,
The method according to claim 4, wherein the method is performed by moving a sample at a corresponding time of a corresponding candidate waveform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP03181397A JP3501199B2 (en) | 1997-02-17 | 1997-02-17 | Acoustic signal separation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP03181397A JP3501199B2 (en) | 1997-02-17 | 1997-02-17 | Acoustic signal separation method |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH10228296A true JPH10228296A (en) | 1998-08-25 |
JP3501199B2 JP3501199B2 (en) | 2004-03-02 |
Family
ID=12341540
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP03181397A Expired - Fee Related JP3501199B2 (en) | 1997-02-17 | 1997-02-17 | Acoustic signal separation method |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP3501199B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004107319A1 (en) * | 2003-05-30 | 2004-12-09 | National Institute Of Advanced Industrial Science And Technology | Method and device for removing known acoustic signal |
JP2009527801A (en) * | 2006-02-21 | 2009-07-30 | 株式会社ソニー・コンピュータエンタテインメント | Speech recognition using speaker adaptation and registration by pitch |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101386645B1 (en) * | 2007-09-19 | 2014-04-17 | 삼성전자주식회사 | Apparatus and method for purceptual audio coding in mobile equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6421498A (en) * | 1987-07-17 | 1989-01-24 | Nec Corp | Automatically scoring system and apparatus |
JPH0526722A (en) * | 1991-07-16 | 1993-02-02 | Bridgestone Corp | Method and device for diagnosing contribution of sound source or vibration source |
JPH0580777A (en) * | 1991-09-20 | 1993-04-02 | Hitachi Ltd | Active sound elimination device for noise in car room |
JPH05100660A (en) * | 1991-10-11 | 1993-04-23 | Brother Ind Ltd | Automatic score drawing device |
JPH05181464A (en) * | 1991-12-27 | 1993-07-23 | Sony Corp | Musical sound recognition device |
JPH0667654A (en) * | 1992-08-19 | 1994-03-11 | Brother Ind Ltd | Automatic music transcription device |
JPH0675562A (en) * | 1992-08-28 | 1994-03-18 | Brother Ind Ltd | Automatic musical note picking-up device |
-
1997
- 1997-02-17 JP JP03181397A patent/JP3501199B2/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6421498A (en) * | 1987-07-17 | 1989-01-24 | Nec Corp | Automatically scoring system and apparatus |
JPH0526722A (en) * | 1991-07-16 | 1993-02-02 | Bridgestone Corp | Method and device for diagnosing contribution of sound source or vibration source |
JPH0580777A (en) * | 1991-09-20 | 1993-04-02 | Hitachi Ltd | Active sound elimination device for noise in car room |
JPH05100660A (en) * | 1991-10-11 | 1993-04-23 | Brother Ind Ltd | Automatic score drawing device |
JPH05181464A (en) * | 1991-12-27 | 1993-07-23 | Sony Corp | Musical sound recognition device |
JPH0667654A (en) * | 1992-08-19 | 1994-03-11 | Brother Ind Ltd | Automatic music transcription device |
JPH0675562A (en) * | 1992-08-28 | 1994-03-18 | Brother Ind Ltd | Automatic musical note picking-up device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004107319A1 (en) * | 2003-05-30 | 2004-12-09 | National Institute Of Advanced Industrial Science And Technology | Method and device for removing known acoustic signal |
GB2418577A (en) * | 2003-05-30 | 2006-03-29 | Nat Inst Of Advanced Ind Scien | Method and device for removing known acoustic signal |
GB2418577B (en) * | 2003-05-30 | 2007-10-17 | Nat Inst Of Advanced Ind Scien | Method and device for removing known acoustic signal |
JP2009527801A (en) * | 2006-02-21 | 2009-07-30 | 株式会社ソニー・コンピュータエンタテインメント | Speech recognition using speaker adaptation and registration by pitch |
Also Published As
Publication number | Publication date |
---|---|
JP3501199B2 (en) | 2004-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Klapuri et al. | Robust multipitch estimation for the analysis and manipulation of polyphonic musical signals | |
EP1579419B1 (en) | Audio signal analysing method and apparatus | |
US8535236B2 (en) | Apparatus and method for analyzing a sound signal using a physiological ear model | |
JP3964792B2 (en) | Method and apparatus for converting a music signal into note reference notation, and method and apparatus for querying a music bank for a music signal | |
JP2002529772A (en) | Fundamental wave high-speed discovery method | |
JP4973537B2 (en) | Sound processing apparatus and program | |
Hargreaves et al. | Structural segmentation of multitrack audio | |
Wang et al. | Adaptive time-frequency scattering for periodic modulation recognition in music signals | |
Taenzer et al. | Investigating CNN-based Instrument Family Recognition for Western Classical Music Recordings. | |
Dannenberg | Listening to “Naima”: An automated structural analysis of music from recorded audio | |
JP3508978B2 (en) | Sound source type discrimination method of instrument sounds included in music performance | |
JP3501199B2 (en) | Acoustic signal separation method | |
Fitria et al. | Music transcription of javanese gamelan using short time fourier transform (stft) | |
US20040158437A1 (en) | Method and device for extracting a signal identifier, method and device for creating a database from signal identifiers and method and device for referencing a search time signal | |
Foo et al. | Application of fast filter bank for transcription of polyphonic signals | |
Siki et al. | Time-frequency analysis on gong timor music using short-time fourier transform and continuous wavelet transform | |
Gainza et al. | Harmonic sound source separation using FIR comb filters | |
JPH1173199A (en) | Acoustic signal encoding method and record medium readable by computer | |
Unnikrishnan | An efficient method for tonic detection from south Indian classical music | |
Chudy et al. | Towards music performer recognition using timbre | |
Peimani | Pitch correction for the human voice | |
Hossain et al. | Frequency component grouping based sound source extraction from mixed audio signals using spectral analysis | |
Ren | Computational modeling of musical performance expression: feature extraction, pattern analysis, and applications | |
Wu | Guitar Sound Analysis and Pitch Detection | |
JP4398049B2 (en) | Time-series signal analysis method and acoustic signal encoding method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
RD01 | Notification of change of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7426 Effective date: 20031125 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20031125 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20071212 Year of fee payment: 4 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081212 Year of fee payment: 5 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20091212 Year of fee payment: 6 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20101212 Year of fee payment: 7 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20101212 Year of fee payment: 7 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20111212 Year of fee payment: 8 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20111212 Year of fee payment: 8 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20121212 Year of fee payment: 9 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20121212 Year of fee payment: 9 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20131212 Year of fee payment: 10 |
|
LAPS | Cancellation because of no payment of annual fees |