JP6668306B2

JP6668306B2 - Sampling frequency estimation device

Info

Publication number: JP6668306B2
Application number: JP2017201493A
Authority: JP
Inventors: 祐高橋
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2017-10-18
Filing date: 2017-10-18
Publication date: 2020-03-18
Anticipated expiration: 2034-04-03
Also published as: JP2018010712A

Description

この発明は、同じ波形を各々別個にサンプリングして得られる複数の信号の同期技術に関する。 The present invention relates to a technique for synchronizing a plurality of signals obtained by separately sampling the same waveform.

近年では、ＩＣレコーダなどの手軽にデジタル録音を行える録音機器や、スマートフォンのようにデジタル録音と同時に録画も行える機器が一般に普及している。ここでデジタル録音とは、音波形をサンプリングして得られるサンプル列の形式で音信号を記録することを言う。スマートフォンを用いて例えばライブ演奏の動画と演奏音を奏者から離れた場所で録音しつつ、奏者の近くに置かれたＩＣレコーダで演奏音を録音し、スマートフォンにより録音された演奏音をＩＣレコーダにより録音された演奏音に差し替えて（或いは前者に後者を重ね合わせて）再生する、といったことも行われている。一般にデバイスの設定で全ての録音機器のサンプリング周波数を同一に設定していたとしても、各録音機器のサンプリング周波数には微細なバラツキが生じてしまう。これはサンプリング周波数を決めるクロック・ジェネレータが完全に同一のクロック周波数で動作しないことに起因する。したがって、同じ音波形を複数の録音機器で別個独立にデジタル録音する場合、たとえ録音開始タイミングを揃えたとしても、録音機器毎にサンプリング周波数が異なるために、サンプリングタイミングに時々刻々とズレが生じてしまう。このようなサンプリング周波数のズレを補正する技術としては、特許文献１、非特許文献１および非特許文献２の各先行技術文献に開示の技術が挙げられる。 In recent years, recording devices that can easily perform digital recording, such as IC recorders, and devices that can perform recording simultaneously with digital recording, such as smartphones, have become widespread. Here, digital recording means recording a sound signal in the form of a sample sequence obtained by sampling a sound waveform. Using a smartphone, for example, while recording a live performance video and performance sound at a location away from the player, record the performance sound with an IC recorder placed near the player, and use the IC recorder to record the performance sound recorded by the smartphone. In some cases, the performance is replaced with a recorded performance sound (or the latter is superimposed on the former). Generally, even if the sampling frequency of all recording devices is set to be the same in the device settings, minute variations occur in the sampling frequency of each recording device. This is because the clock generator that determines the sampling frequency does not operate at the exact same clock frequency. Therefore, when digitally recording the same sound waveform separately and independently by a plurality of recording devices, even if the recording start timings are aligned, the sampling frequency varies from recording device to recording device, and the sampling timing is shifted every moment. I will. As a technique for correcting such a deviation of the sampling frequency, there is a technique disclosed in each prior art document of Patent Document 1, Non-Patent Document 1, and Non-Patent Document 2.

非特許文献１には、送信機から基準信号（パイロット信号）を送出し、受信機側で受信した信号に含まれる基準信号からサンプリング周波数ズレによる周波数シフトを検出し、補正する技術が開示されている。特許文献１には、音場の伝達特性を計測するときに計測信号（ＴＳＰ信号など）を送出する側と、受信する側でサンプリング周波数が異なってしまう場合の補正技術が開示されている。特許文献１に開示の技術では、計測時に雑音の影響を抑制するためにＴＳＰ信号を繰り返し送出し、測定した複数のＴＳＰ信号を一定時間ごとに切出し、このようにして切り出した各ＴＳＰ信号の位相差を検出することでサンプリング周波数ズレを推定し補正している。 Non-Patent Document 1 discloses a technique for transmitting a reference signal (pilot signal) from a transmitter and detecting and correcting a frequency shift due to a sampling frequency shift from a reference signal included in a signal received on the receiver side. I have. Patent Literature 1 discloses a correction technique in a case where a sampling frequency is different between a side that transmits a measurement signal (such as a TSP signal) and a side that receives a measurement signal when measuring transfer characteristics of a sound field. In the technique disclosed in Patent Document 1, a TSP signal is repeatedly transmitted in order to suppress the influence of noise at the time of measurement, a plurality of measured TSP signals are cut out at regular intervals, and the position of each TSP signal cut out in this way is determined. The sampling frequency shift is estimated and corrected by detecting the phase difference.

非特許文献２には、複数の録音機器間のサンプリング周波数のズレを統計信号処理を利用して補正する技術が開示されている。非特許文献２に開示の技術では、まず、複数の録音機器により録音された各録音信号に対して基準信号を定める。そして、その基準信号に対してサンプリング周波数がズレていた場合の信号を統計的にモデル化し、基準信号以外の信号を統計的モデルに当てはめることでサンプリング周波数のズレを推定している。 Non-Patent Document 2 discloses a technique of correcting a deviation of a sampling frequency among a plurality of recording devices using statistical signal processing. In the technique disclosed in Non-Patent Document 2, first, a reference signal is determined for each recording signal recorded by a plurality of recording devices. Then, a signal in the case where the sampling frequency is shifted from the reference signal is statistically modeled, and a signal other than the reference signal is applied to the statistical model to estimate the sampling frequency shift.

特開２００２−１０１５００号公報JP 2002-101500 A

松岡保静，中島祐輔, 吉村健, “移動端末のマイクロホンで情報を取得する音波情報伝達方式”,NTT DocomoTechnical Journal, vol.14, No.2, 2006Yasumatsu Matsuoka, Yusuke Nakajima, Ken Yoshimura, "Acoustic information transmission method for acquiring information with microphone of mobile terminal", NTT Docomo Technical Journal, vol.14, No.2, 2006 Shigeki Miyabe, Nobutaka Ono, andShoji Makino, “BLIND COMPENSATION OF INTER-CHANNEL SAMPLING FREQUENCY MISMATCH WITHMAXIMUM LIKELIHOOD ESTIMATION IN STFT DOMAIN,” proc. ICASSP 2013, pp.674-678Shigeki Miyabe, Nobutaka Ono, andShoji Makino, “BLIND COMPENSATION OF INTER-CHANNEL SAMPLING FREQUENCY MISMATCH WITHMAXIMUM LIKELIHOOD ESTIMATION IN STFT DOMAIN,” proc. ICASSP 2013, pp.674-678

しかし、非特許文献１に開示の技術や特許文献１に開示の技術には、制約が多く、汎用性に欠けるといった問題がある。例えば、非特許文献１に開示の技術には、基準信号（パイロット信号）を発生する装置が必要になり、また、基準信号によって録音信号に影響が生じてしまう、といった問題がある。一方、特許文献１に開示の技術には、同一の信号が一定間隔で繰り返し送出されるという条件でなければ利用することができない、といった問題がある。これに対して、非特許文献２には、汎用性に欠けるといった問題はないものの、その実行には多大な計算量を要し、サンプリング周波数ズレの推定を完了するまでに要する計算時間が長い、とった問題がある。 However, the technology disclosed in Non-Patent Document 1 and the technology disclosed in Patent Document 1 have a problem that they have many restrictions and lack versatility. For example, the technique disclosed in Non-Patent Document 1 requires a device for generating a reference signal (pilot signal), and also has a problem that a recorded signal is affected by the reference signal. On the other hand, the technique disclosed in Patent Literature 1 has a problem that the same signal cannot be used unless it is repeatedly transmitted at regular intervals. On the other hand, Non-Patent Document 2 does not have the problem of lack of versatility, but its execution requires a large amount of calculation, and the calculation time required to complete the estimation of the sampling frequency shift is long. There is a problem taken.

本発明は以上に説明した課題に鑑みて為されたものであり、同じ波形を別個独立にサンプリングして得られる複数の信号の同期を従来よりも短い計算時間で実現することが可能で、かつ高い汎用性を有する技術を提供することを目的とする。 The present invention has been made in view of the above-described problem, and it is possible to realize synchronization of a plurality of signals obtained by separately sampling the same waveform in a shorter calculation time than before, and An object is to provide a technology having high versatility.

上記課題を解決するために本発明は、同じ波形を別個独立にサンプリングして得られる複数の信号のうちの１つを基準信号とし、残りの信号のうちの１つを補正対象信号として、前記基準信号と前記補正対象信号の一方を時間軸方向にずらしつつ両信号の相関をフレーム毎に算出し、その算出結果に応じて両信号の時間ズレ量をフレーム毎に算出する時間ズレ量算出部と、前記時間ズレ量算出部により算出された時間ズレ量から、各フレームにおける前記補正対象信号のサンプリング周波数の誤差の推定値である第１の推定値をフレーム毎に算出する誤差算出部と、前記誤差算出部によりフレーム毎に算出された前記第１の推定値に統計処理を施して前記補正対象信号全体に亘るサンプリング周波数の誤差の推定値である第２の推定値を算出し出力する統計処理部と、を有することを特徴とするサンプリング周波数推定装置、を提供する。なお、補正対象信号のサンプリング周波数の誤差は基準信号のサンプリング周波数からの補正対象信号のサンプリング周波数のズレであるから、当該誤差と基準信号のサンプリング周波数とから補正対象信号のサンプリング周波数を求めることができる。したがって、当該誤差の推定値を算出すること（すなわち、当該誤差を推定すること）は補正対象信号のサンプリング周波数を推定することと等価である。 In order to solve the above-described problems, the present invention provides a method in which one of a plurality of signals obtained by separately sampling the same waveform is used as a reference signal, and one of the remaining signals is used as a correction target signal. A time shift amount calculating unit that calculates a correlation between both signals for each frame while shifting one of the reference signal and the correction target signal in the time axis direction, and calculates a time shift amount of both signals for each frame according to the calculation result. And an error calculation unit that calculates, for each frame, a first estimated value that is an estimated value of an error of the sampling frequency of the correction target signal in each frame from the time shift amount calculated by the time shift amount calculation unit. Statistical processing is performed on the first estimated value calculated for each frame by the error calculating unit to calculate a second estimated value that is an estimated value of an error of a sampling frequency over the entire correction target signal. Providing sampling frequency estimating apparatus, characterized by having a statistical processing unit for outputting. Since the error in the sampling frequency of the correction target signal is a deviation of the sampling frequency of the correction target signal from the sampling frequency of the reference signal, the sampling frequency of the correction target signal can be obtained from the error and the sampling frequency of the reference signal. it can. Therefore, calculating the estimated value of the error (that is, estimating the error) is equivalent to estimating the sampling frequency of the correction target signal.

このようなサンプリング周波数推定装置によれば、同じ波形を別個独立にサンプリングして得られる複数の信号のうちの１つを基準信号とし、残りの信号の各々を補正対象信号として各補正対象信号のサンプリング周波数の誤差の推定値を算出し、時間軸圧伸などの既存の技術を用いてその誤差を補正することで、各補正対象信号を基準信号に同期させることが可能になる。本発明のサンプリング周波数推定装置では、パイロット信号を必要とせず、また各信号は一定時間に亙って繰り返し出力されたものである必要もないため、非特許文献１や特許文献１に開示の技術に比較して汎用性が高い。また詳細については後述するが、本発明のサンプリング周波数推定装置によれば非特許文献２に開示の技術を用いた場合よりも短い計算時間で補正対象信号のサンプリング周波数の誤差を算出することができ、同じ波形を別個独立にサンプリングして得られる複数の信号の同期を従来よりも短い計算時間で実現することが可能になる。 According to such a sampling frequency estimation device, one of a plurality of signals obtained by independently sampling the same waveform is used as a reference signal, and each of the remaining signals is used as a correction target signal, and By calculating an estimated value of the sampling frequency error and correcting the error using an existing technique such as time axis companding, it becomes possible to synchronize each correction target signal with the reference signal. The sampling frequency estimating apparatus of the present invention does not require a pilot signal and does not require each signal to be repeatedly output over a certain period of time. Versatility is higher than. Although the details will be described later, according to the sampling frequency estimating apparatus of the present invention, the error of the sampling frequency of the correction target signal can be calculated in a shorter calculation time than when the technique disclosed in Non-Patent Document 2 is used. In addition, synchronization of a plurality of signals obtained by independently sampling the same waveform can be realized in a shorter calculation time than before.

統計処理部の具体的な構成としては、誤差算出部によりフレーム毎に算出された第１の推定値（すなわち、各フレームにおける大まかな誤差の推定値）から、統計的にエラーを多く含むと推定される外れ値を除外する第１の統計フィルタ処理と、第１の統計フィルタ処理より外れ値が除外された一群の第１の推定値を平滑化するフィルタ処理（例えば、平均値を算出する処理）と当該一群の第１の推定値から代表値を選択するフィルタ処理（例えば、中央値を選択する処理）の何れか一方からなる第２の統計フィルタ処理とで上記統計処理を構成し、第２の統計フィルタ処理の処理結果を第２の推定値（補正対象信号全体に亘るサンプリング周波数の誤差の推定値）として出力する構成が考えられる。 As a specific configuration of the statistical processing unit, it is estimated from the first estimated value calculated for each frame by the error calculating unit (that is, an estimated value of a rough error in each frame) that the error calculating unit statistically includes many errors. First statistical filter processing for removing outliers to be performed, and filter processing for smoothing a group of first estimated values from which outliers have been removed by the first statistical filter processing (for example, processing for calculating an average value) ) And a second statistical filtering process comprising one of a filtering process (for example, a process of selecting a median value) for selecting a representative value from the group of first estimated values, and A configuration in which the processing result of the second statistical filter processing is output as a second estimated value (estimated value of the error of the sampling frequency over the entire signal to be corrected) may be considered.

第１の統計フィルタ処理の具体例としては、誤差算出部によりフレーム毎に算出された第１の推定値をその大きさ順にソートした場合における両端から所定個数分、または両端から所定個数番目の各値に応じて定まる範囲に属さないもの、を外れ値として除去する処理が挙げられる。例えば、上記所定個数が誤差算出部により算出された第１の推定値の総数の１／４であれば、第１四分位数未満の値と第３四分位数より大きい値を外れ値として除外することになる。また、第１四分位数および第３四分位数に重みを付与して上記範囲を定める態様であれば所謂四分位数範囲法により外れ値を除外することになる。 As a specific example of the first statistical filter processing, when the first estimated values calculated for each frame by the error calculating unit are sorted in the order of their sizes, a predetermined number from each end, or a predetermined number from each end. There is a process of removing a value that does not belong to a range determined according to the value as an outlier. For example, if the predetermined number is 1/4 of the total number of the first estimated values calculated by the error calculating unit, the value smaller than the first quartile and the value larger than the third quartile are outliers. Will be excluded. Further, in a mode in which the first quartile and the third quartile are weighted to determine the above range, outliers are excluded by the so-called quartile range method.

より好ましい態様においては、時間ズレ量算出部は、基準信号および補正対象信号のパワーが所定の閾値未満であるフレームを時間ズレ量の算出対象から除外することを特徴とする。上記閾値を適切な値に定めておけば、基準信号を充分な強度で含んでいないフレームや補正対象信号を充分な強度で含んでいないフレームは時間ズレ量の算出対象から除外される。基準信号を充分な強度で含んでいないフレームや補正対象信号を充分な強度で含んでいないフレームを参照して時間ズレ量を算出したとしてもエラーを多く含むものとなってしまう。このような時間ズレ量に基づいて算出された誤差の推定値は外れ値として第１の統計フィルタ処理で除外される可能性が高く、そもそも時間ズレ量の算出自体が無駄になってしまう。このような態様によれば、時間ズレ量算出部において無駄な演算が行われることを回避して補正対象信号のサンプリング周波数の推定に要する処理時間をさらに短くしつつ、サンプリング周波数の誤差を高い精度で算出することが可能になる。 In a more preferred aspect, the time shift amount calculation unit excludes a frame in which the powers of the reference signal and the correction target signal are less than a predetermined threshold from calculation targets of the time shift amount. If the threshold value is set to an appropriate value, a frame that does not include the reference signal with sufficient intensity or a frame that does not include the correction target signal with sufficient intensity is excluded from the calculation target of the time shift amount. Even if the amount of time lag is calculated with reference to a frame that does not include the reference signal with sufficient strength or a frame that does not include the correction target signal with sufficient strength, the amount of error will include many errors. The estimated value of the error calculated based on such a time lag amount is likely to be excluded as an outlier in the first statistical filter processing, and the calculation of the time lag amount itself is wasted in the first place. According to such an aspect, it is possible to avoid a wasteful calculation being performed in the time lag amount calculating unit, further shorten the processing time required for estimating the sampling frequency of the correction target signal, and reduce the sampling frequency error with high accuracy. Can be calculated by

また、別の好ましい態様においては、時間ズレ量算出部は、基準信号と補正対象信号の相関を表す値（例えば、時間をずらしつつ算出した複数の相互相関関数のうちの最大値）が所定の閾値を下回るフレームを時間ズレ量の算出対象から除外することを特徴とする。上記閾値を適切な値に定めておけば、補正対象信号を構成するフレームのうち基準信号にて対応するフレームとの相関が低いフレームについて時間ズレ量の算出が行われることはない。このようなフレームについて時間ズレ量を算出したとしてもエラーを多く含むものとなってしまい、このような時間ズレ量に基づいて算出された誤差の推定値は外れ値として上記第１の統計フィルタ処理にて除外される可能性が高く、そもそも時間ズレ量の算出自体が無駄になってしまう。このような態様によっても、時間ズレ量算出部において無駄な演算が行われることを回避し補正対象信号のサンプリング周波数の推定に要する処理時間をさらに短くしつつ、サンプリング周波数を高い精度で算出することが可能になる。 In another preferred aspect, the time lag amount calculating unit sets the value representing the correlation between the reference signal and the correction target signal (for example, the maximum value among a plurality of cross-correlation functions calculated while shifting the time) to a predetermined value. It is characterized in that frames below the threshold value are excluded from the calculation of the amount of time shift. If the threshold value is set to an appropriate value, the time lag amount is not calculated for a frame that has a low correlation with the frame corresponding to the reference signal among the frames forming the correction target signal. Even if the amount of time shift is calculated for such a frame, it will contain many errors, and the estimated value of the error calculated based on the amount of time shift is regarded as an outlier as the first statistical filter processing. Is highly likely to be excluded, and the calculation itself of the time lag amount is wasted in the first place. According to such an aspect as well, it is possible to calculate the sampling frequency with high accuracy while avoiding unnecessary calculation being performed in the time lag amount calculating unit and further reducing the processing time required for estimating the sampling frequency of the correction target signal. Becomes possible.

上記課題を解決するための別の態様としては、ＣＰＵ（Central Processing Unit）などの一般的なコンピュータを、上記時間ズレ量算出部、誤差算出部および統計処理部として機能させるプログラムを提供する態様が考えられる。このようなプログラムにしたがって一般的なコンピュータを作動させることで、当該コンピュータを本発明のサンプリング周波数推定装置として機能させることが可能になるからである。なお、このようなプログラムの具体的な提供態様としては、ＣＤ−ＲＯＭ（Compact Disk-Read Only Memory）などのコンピュータ読み取り可能な記録媒体に当該プログラムを書き込んで配布する態様や、インターネットなどの電気通信回線経由のダウンロードにより配布する態様が考えられる。 As another mode for solving the above-described problem, a mode is provided that provides a program that causes a general computer such as a CPU (Central Processing Unit) to function as the time shift amount calculation unit, the error calculation unit, and the statistical processing unit. Conceivable. By operating a general computer according to such a program, the computer can function as the sampling frequency estimation device of the present invention. Note that specific provision modes of such a program include a mode in which the program is written and distributed on a computer-readable recording medium such as a CD-ROM (Compact Disk-Read Only Memory), and a mode in which electric communication such as the Internet is used. A mode of distributing by downloading via a line is conceivable.

また、上記課題を解決するためのさらに別の態様としては、同じ波形を別個独立にサンプリングして得られる複数の信号のうちの１つを基準信号とし、残りの信号のうちの１つを補正対象信号として、前記基準信号と前記補正対象信号の一方を時間軸方向にずらしつつ両信号の相互相関関数をフレーム毎に算出し、その算出結果に応じて両信号の時間ズレ量をフレーム毎に算出する時間ズレ量算出ステップと、前記時間ズレ量算算出ステップにて算出された時間ズレ量から、各フレームにおける前記補正対象信号のサンプリング周波数の誤差の推定値をフレーム毎に算出する誤差算出ステップと、誤差算出ステップにてフレーム毎に算出された誤差の推定値に統計処理を施して前記補正対象信号のサンプリング周波数の誤差の推定値を算出し出力する統計処理ステップと、を有することを特徴とするサンプリング周波数推定方法を提供する態様も考えられる。また、ＣＰＵなどの一般的なコンピュータに、上記時間ズレ量算出ステップ、上記誤差算出ステップおよび上記統計処理ステップの各ステップを実行させるプログラムを提供する態様も考えられる。 Further, as still another mode for solving the above-mentioned problem, one of a plurality of signals obtained by separately sampling the same waveform is used as a reference signal, and one of the remaining signals is corrected. As a target signal, a cross-correlation function of both signals is calculated for each frame while shifting one of the reference signal and the correction target signal in the time axis direction, and a time shift amount of both signals is calculated for each frame according to the calculation result. Calculating a time shift amount to be calculated; and an error calculating step of calculating, for each frame, an estimated value of a sampling frequency error of the correction target signal in each frame from the time shift amount calculated in the time shift amount calculation step. And performing statistical processing on the estimated value of the error calculated for each frame in the error calculating step to calculate an estimated value of the error of the sampling frequency of the correction target signal. Aspect to provide a sampling frequency estimation method characterized by comprising: a statistical processing step of force, also conceivable. It is also conceivable to provide a program for causing a general computer such as a CPU to execute each of the time shift amount calculation step, the error calculation step, and the statistical processing step.

本発明の一実施形態のサンプリング周波数推定装置１０の構成例および同サンプリング周波数推定装置１０を含む信号処理システム１の構成例を示すブロック図である。1 is a block diagram illustrating a configuration example of a sampling frequency estimation device 10 according to an embodiment of the present invention and a configuration example of a signal processing system 1 including the same. サンプリング周波数推定装置１０により得られたフレーム毎のサンプリング周波数の推定結果を示す図である。FIG. 3 is a diagram showing a result of estimating a sampling frequency for each frame obtained by a sampling frequency estimating apparatus 10. 四分位数と四分位数範囲を説明するための図である。It is a figure for demonstrating a quartile and a quartile range. 四分位数に基づく外れ値除去処理の効果を説明するための図である。It is a figure for explaining an effect of outlier removal processing based on a quartile. 本実施形態と非特許文献２に開示の技術についての評価実験の実験結果を示す図である。FIG. 11 is a diagram showing experimental results of evaluation experiments on the present embodiment and the technology disclosed in Non-Patent Document 2. 本実施形態と非特許文献２に開示の技術についての評価実験の実験結果を示す図である。FIG. 11 is a diagram showing experimental results of evaluation experiments on the present embodiment and the technology disclosed in Non-Patent Document 2.

以下図面を参照しつつ本発明の実施形態を説明する。
（Ａ：構成）
図１は、本発明の一実施形態のサンプリング周波数推定装置１０の構成例および同サンプリング周波数推定装置１０を含む信号処理システム１の構成例を示すブロック図である。この信号処理システム１には、同じ音波形をＮ（Ｎは２以上の自然数）台の録音機器（例えばスマートフォンやＩＣレコーダなど）の各々により別個独立にサンプリングして得られた各音信号（サンプル列）Ｘ_ｎ（ｔ）（ｎ＝１〜Ｎ）が入力される。なお、Ｎ台の録音機器における録音開始タイミングの同期については既存技術を適宜用いるようにすれば良い。例えば、各録音機器がインターネットなどの電気通信回線経由の通信を実行可能であれば、当該通信により録音開始タイミングを合せれば良く、Bluetooth(登録商標)など他の通信手段による通信が可能であれば、当該通信手段による通信により録音開始タイミングを合せれば良い。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(A: Configuration)
FIG. 1 is a block diagram illustrating a configuration example of a sampling frequency estimation device 10 according to an embodiment of the present invention and a configuration example of a signal processing system 1 including the sampling frequency estimation device 10. The signal processing system 1 includes sound signals (samples) obtained by separately and independently sampling the same sound waveform by N (N is a natural number of 2 or more) recording devices (for example, a smartphone or an IC recorder). Column) X _n (t) (n = 1 to N) is input. Note that the existing technology may be appropriately used for synchronizing the recording start timings in the N recording devices. For example, if each recording device can execute communication via a telecommunication line such as the Internet, the recording start timing may be adjusted by the communication, and communication by other communication means such as Bluetooth (registered trademark) is possible. For example, the recording start timing may be adjusted by communication by the communication means.

上記Ｎ台の録音機器におけるサンプリング周波数は何れも同じ値（例えば４４．１ｋＨｚ）に設定されてはいるが、各録音機器のクロック・ジェネレータが完全に同一のクロック周波数で動作する訳ではなく、各録音機器におけるサンプリング周波数には微妙なズレがある。このため、各録音機器で録音開始タイミングを揃えて録音を行い、上記Ｎ個の音信号をその先頭を揃えて再生したとしても、次第に音がズレ、再生が進むにつれてそのズレは大きくなる。本実施形態の信号処理システム１は、上記Ｎ個の音信号間のサンプリング周波数の誤差を推定して補正し、これらＮ個の音信号を同期させることができるようにするためのものである。 Although the sampling frequency of each of the N recording devices is set to the same value (for example, 44.1 kHz), the clock generator of each recording device does not operate at completely the same clock frequency. There is a slight deviation in the sampling frequency of the recording device. For this reason, even if each recording device performs recording at the same recording start timing and reproduces the N sound signals with their heads aligned, the deviation gradually increases and the deviation increases as the reproduction progresses. The signal processing system 1 according to the present embodiment is for estimating and correcting the sampling frequency error between the N sound signals so that the N sound signals can be synchronized.

図１に示すように信号処理システム１は、サンプリング周波数推定装置１０と、時間軸圧伸装置２０とを含んでいる。サンプリング周波数推定装置１０には、上記Ｎ個の音信号が与えられる。サンプリング周波数推定装置１０は、これらＮ個の音信号のうちの１つを基準信号とし、残りのＮ−１個の音信号の各々を補正対象信号として、基準信号のサンプリング周波数に対する各補正対象信号のサンプリング周波数のズレ（すなわち、誤差）を補正対象信号毎に推定し、その推定結果を示すデータを時間軸圧伸装置２０に与える。時間軸圧伸装置２０は、補正対象信号毎に推定されたサンプリング周波数の誤差が解消されるように各補正対象信号に時間軸圧伸を施す。これにより、Ｎ個の音信号の同期が実現さされる。時間軸圧伸装置２０における時間軸圧伸アルゴリズムとしては既存の技術を適宜用いるようにすれば良い。本実施形態では、サンプリング周波数推定装置１０に本実施形態の特徴を顕著に示す処理を実行させることで、上記各補正対象信号のサンプリング周波数の誤差の推定を従来よりも短い計算時間で実現しつつ、高い汎用性を確保することが可能になっている。以下では、本実施形態の特徴を顕著に示すサンプリング周波数推定装置１０を中心に説明する。 As shown in FIG. 1, the signal processing system 1 includes a sampling frequency estimating device 10 and a time axis companding device 20. The sampling frequency estimating device 10 is supplied with the N sound signals. The sampling frequency estimating apparatus 10 sets one of the N sound signals as a reference signal, and sets each of the remaining N-1 sound signals as a correction target signal. Is estimated for each correction target signal, and data indicating the estimation result is given to the time axis companding device 20. The time axis companding device 20 performs time axis companding on each correction target signal such that an error in the sampling frequency estimated for each correction target signal is eliminated. Thereby, synchronization of N sound signals is realized. As a time axis companding algorithm in the time axis companding apparatus 20, an existing technique may be appropriately used. In the present embodiment, the sampling frequency estimating apparatus 10 performs a process that prominently illustrates the features of the present embodiment, thereby realizing estimation of the sampling frequency error of each of the correction target signals in a shorter calculation time than before. , High versatility can be ensured. In the following, description will be made mainly on the sampling frequency estimating apparatus 10 which remarkably shows the features of the present embodiment.

サンプリング周波数推定装置１０は、図１に示すように、短時間フーリエ変換（図１では、「ＳＴＦＴ」と表記）部１００、時間ズレ量算出部１１０、誤差算出部１２０、および統計処理部１３０を含んでいる。図１に示す各部は電子回路で構成されたハードウェアモジュールであっても良いし、ＣＰＵ（Central Processing Unit）を信号処理プログラムに従って作動させることで実現されるソフトウェアモジュールであっても良い。 As shown in FIG. 1, the sampling frequency estimating apparatus 10 includes a short-time Fourier transform (in FIG. 1, denoted as “STFT”) unit 100, a time shift amount calculating unit 110, an error calculating unit 120, and a statistical processing unit 130. Contains. Each unit illustrated in FIG. 1 may be a hardware module configured by an electronic circuit, or may be a software module realized by operating a CPU (Central Processing Unit) according to a signal processing program.

ＳＴＦＴ部１００は、サンプリング周波数推定装置１０へ入力された音信号Ｘ_ｎ（ｔ）（ｎ＝１〜Ｎ）の各々を所定サンプル数ずつのフレームに区切り、フレーム毎に短時間フーリエ変換を施して周波数領域の信号Ｘ_ｎ（ｆ）（ｆは周波数を表す変数、以下、同じ）に変換して時間ズレ量算出部１１０に与える。ＳＴＦＴ部１００にて使用する変換アルゴリズムについては周知のものを適宜利用すれば良い。 The STFT unit 100 divides each of the sound signals X _n (t) (n = 1 to N) input to the sampling frequency estimation device 10 into frames each having a predetermined number of samples, and performs a short-time Fourier transform for each frame. The signal is converted into a signal X _n (f) in the frequency domain (where f is a variable representing the frequency, the same applies hereinafter) and given to the time shift amount calculation unit 110. A well-known conversion algorithm may be appropriately used for the STFT unit 100.

時間ズレ量算出部１１０は、Ｎ個の音信号のうちの１つを基準信号として選択し、残りのＮ−１個の音信号の各々を補正対象信号として順次選択し、基準信号と補正対象信号の一方を時間軸方向にずらしつつ両信号の相互相関関数をフレーム毎に算出して両信号の時間ズレ量をフレーム毎に算出する。以下では基準信号としてＸ_ｒｅｆ（ｆ）が選択され、補正対象信号としてＸ_ｋ（ｆ）（ｋ≠ｒｅｆ）が選択された場合を例に取って時間ズレ量算出部１１０が実行する処理を詳細に説明する。 The time shift amount calculation unit 110 selects one of the N sound signals as a reference signal, sequentially selects each of the remaining N-1 sound signals as a correction target signal, and selects the reference signal and the correction target. The cross-correlation function of both signals is calculated for each frame while shifting one of the signals in the time axis direction, and the time shift amount of both signals is calculated for each frame. In the following, the process executed by the time shift amount calculation unit 110 will be described in detail by taking as an example a case where X _ref (f) is selected as the reference signal and X _k (f) (k ≠ ref) is selected as the correction target signal. Will be described.

時間ズレ量算出部１１０は、まず、基準信号Ｘ_ｒｅｆ（ｆ）と補正対象信号Ｘ_ｋ（ｆ）（ｋ≠ｒｅｆ）の各々のフレーム毎の相互相関関数Ｃ（τ）をτの値を変えつつ算出する。相互相関関数Ｃ（τ）を算出するのは、時間ズレ量を補正する方法として相互相関関数が最大となるサンプル点を推定する方法が一般に知られているからである。一般に相互相関関数Ｃ（τ）は、時間領域の２つのアナログ信号ｘ（ｔ）とｙ（ｔ）があった時、以下の数１で表される。また、デジタル信号であれば、以下の数２で表される。

The time shift amount calculation unit 110 first changes the cross-correlation function C (τ) of the reference signal X _ref (f) and the correction target signal X _k (f) (k） ref) for each frame by changing the value of τ. While calculating. The reason for calculating the cross-correlation function C (τ) is that a method for estimating a sample point at which the cross-correlation function is maximized is generally known as a method for correcting the time shift amount. Generally, when there are two analog signals x (t) and y (t) in the time domain, the cross-correlation function C (τ) is expressed by the following equation (1). In the case of a digital signal, it is expressed by the following equation (2).

数１或いは数２により算出される相互相関関数Ｃ（τ）は、２つの信号のどちらかを時間軸方向にτだけずらして内積を取ることに相当する。「同じ信号が２つの信号のどこかに含まれている」という仮定が成り立つのであれば、相互相関関数が最大となるτを求めることで、２つの信号の時間ズレ量を推定できると考えられる。なお、２つの信号のフーリエ変換Ｘ（ｆ）とＹ（ｆ）とを考えると、相互相関関数Ｃ（τ）は以下の数３により計算される。数３の右辺のＩＦＦＴ（）は逆フーリエ変換を表す演算子であり、Ｘ^＊（ｆ）はＸ（ｆ）の複素共役を表す。

The cross-correlation function C (τ) calculated by Equation 1 or 2 corresponds to taking an inner product by shifting one of the two signals by τ in the time axis direction. If the assumption that “the same signal is included somewhere in the two signals” holds, it is considered that the time shift amount of the two signals can be estimated by obtaining τ at which the cross-correlation function is maximum. . Considering the Fourier transforms X (f) and Y (f) of the two signals, the cross-correlation function C (τ) is calculated by the following equation (3). IFFT () on the right side of Expression 3 is an operator representing an inverse Fourier transform, and X ^* (f) represents a complex conjugate of X (f).

本実施形態の時間ズレ量算出部１１０は、基準信号Ｘ_ｒｅｆ（ｆ）と補正対象信号Ｘ_ｋ（ｆ）（ｋ≠ｒｅｆ）の各フレームの相互相関関数Ｃ（τ）をτの値を変えつつ数３にしたがって算出する。具体的には、時間ズレ量算出部１１０は、基準信号におけるｉ番目のフレームについての信号Ｘ_ｒｅｆ（ｆ）の複素共役を数３右辺のＸ^＊（ｆ）とし、補正対象信号におけるｉ番目のフレームについての信号Ｘ_ｋ（ｆ）（ｋ≠ｒｅｆ）を数３右辺のＹ（ｆ）としてτを変えつつ数３の演算を行い、相互相関関数Ｃ（τ）が最大になるτを特定する。そして、時間ズレ量算出部１１０は、このようにして特定したτを、補正対象信号Ｘ_ｋ（ｆ）のｉ番目のフレームについての時間ズレ量の推定値Ｎ_ki（すなわち、相互相関関数Ｃ（τ）を最大にするτ）として誤差算出部１２０に与える。他の番号のフレームについても同様である。 The time shift amount calculation unit 110 according to the present embodiment changes the cross-correlation function C (τ) of each frame of the reference signal X _ref (f) and the correction target signal X _k (f) (k ≠ ref) by changing the value of τ. While calculating according to Equation 3. Specifically, the time shift amount calculation unit 110 sets the complex conjugate of the signal X _ref (f) for the i-th frame in the reference signal to X ^* (f) on the right-hand side of Expression 3, and sets the i-th frame in the correction target signal. Using the signal X _k (f) (k （ref) for the frame as Y (f) on the right-hand side of Formula 3, the calculation of Formula 3 is performed while changing τ, and τ at which the cross-correlation function C (τ) is maximized is specified. . Then, the time shift amount calculation unit 110 converts the τ specified in this manner into the estimated value N _ki of the time shift amount for the i-th frame of the correction target signal X _k (f) (that is, the cross-correlation function C ( is given to the error calculation unit 120 as τ) that maximizes τ). The same applies to frames of other numbers.

本実施形態において数１或いは数２の時間領域の演算ではなく、数３に示す周波数領域の演算により相互相関関数Ｃ（τ）を算出するようにしたのは、計算量の点で有利なことがあるからである。本実施形態では、時間ズレ量算出部１１０にて数３に示す演算により相互相関関数Ｃ（τ）を算出できるようにするために、ＳＴＦＴ部１００が設けられている。したがって、時間ズレ量算出部１１０にて数１或いは数２に示す演算により相互相関関数Ｃ（τ）を算出するのであればＳＴＦＴ部１００を省略しても良い。 In this embodiment, the calculation of the cross-correlation function C (τ) by the calculation in the frequency domain shown in Expression 3 instead of the calculation in the time domain of Expression 1 or 2 is advantageous in terms of calculation amount. Because there is. In the present embodiment, the STFT unit 100 is provided so that the time shift amount calculation unit 110 can calculate the cross-correlation function C (τ) by the calculation shown in Expression 3. Therefore, the STFT unit 100 may be omitted if the time shift amount calculation unit 110 calculates the cross-correlation function C (τ) by the calculation shown in Expression 1 or Expression 2.

誤差算出部１２０は、時間ズレ量算出部１１０から与えられる各フレームの時間ズレ量Ｎ_ｋｉに基づいて、基準信号のサンプリング周波数ｆ_ｓに対する補正対象信号のサンプリング周波数の誤差の推定値（以下、第１の推定値）Ｅ_ｋｉをフレーム毎に算出する。例えば、ｉ番目のフレームについての時間ズレ量がＮ_kiであり、補正対象信号Ｘ_ｋ（ｆ）におけるｉ番目のフレームの先頭サンプルが当該信号の先頭からＳ_ki番目のサンプルであった場合には、誤差算出部１２０は、当該ｉ番目のフレームについての上記第１の推定値Ｅ_kiを以下の数４にしたがって算出し、統計処理部１３０に与える。前述したように、時間ズレ量算出部１１０により算出された時間ズレ量Ｎ_kiは基準信号の先頭を基準としたズレ量であるため、本実施形態のようにＳＴＦＴによって各フレームの相互相関関数を求めた場合にはフレーム先頭を基準としたズレ量である。このため、ｉ番目のフレームでの相互相関関数を基にした補正対象信号Ｘ_ｋ（ｆ）のサンプリング周波数の誤差の推定値Ｅ_kiは以下の数４により表される。

The error calculating unit 120 estimates the error of the sampling frequency error of the correction target signal with respect to the sampling frequency f _s of the reference signal based on the time shift amount N _ki of each frame given from the time shift amount calculating unit 110 (hereinafter, referred to as the (Estimated value of 1) E _ki is calculated for each frame. For example, if the time shift amount of the i-th frame is N _ki and the first sample of the i-th frame in the correction target signal X _k (f) is the S _ki- th sample from the head of the signal, , The error calculation unit 120 calculates the first estimated value E _ki for the i-th frame according to the following _Expression 4, and gives the calculated value to the statistical processing unit 130. As described above, since the time deviation amount N _ki calculated by the time shift amount calculation unit 110 is a shift amount relative to the beginning of the reference signal, a cross correlation function for each frame by STFT as in this embodiment If found, it is the amount of displacement based on the top of the frame. For this reason, the estimated value E _ki of the error of the sampling frequency of the correction target signal X _k (f) based on the cross-correlation function in the i-th frame is represented by the following _Expression 4.

統計処理部１３０は、誤差算出部１２０によりフレーム毎に算出された第１の推定値Ｅ_kiに統計処理を施して補正対象信号全体に亘るサンプリング周波数の誤差の推定値（以下、第２の推定値）Ｅを算出し、時間軸圧伸装置２０へ出力する。図１に示すように、統計処理部１３０は、第１の統計フィルタ処理部１３０ａと第２の統計フィルタ処理部１３０ｂとを含んでいる。つまり、統計処理部１３０の実行する統計処理は、第１の統計フィルタ処理部１３０ａの実行する処理と第２の統計フィルタ処理部１３０ｂの実行する処理により構成されている。これら各統計フィルタ処理部の実行する処理の内容は以下の通りである。 The statistical processing unit 130 performs statistical processing on the first estimated value E _ki calculated for each frame by the error calculating unit 120, and estimates the sampling frequency error over the entire signal to be corrected (hereinafter, referred to as a second estimated value). Value) E is calculated and output to the time axis compander 20. As shown in FIG. 1, the statistical processing unit 130 includes a first statistical filter processing unit 130a and a second statistical filter processing unit 130b. In other words, the statistical processing performed by the statistical processing unit 130 includes the processing performed by the first statistical filter processing unit 130a and the processing performed by the second statistical filter processing unit 130b. The contents of the processing executed by each of these statistical filter processing units are as follows.

第１の統計フィルタ処理部１３０ａは、誤差算出部１２０によりフレーム毎に算出された第１の推定値Ｅ_kiから、統計的にエラーを多く含むと推定される外れ値を除外する第１の統計フィルタ処理を実行する。誤差算出部１２０によりフレーム毎に算出された第１の推定値Ｅ_kiには、多くのエラーが含まれていることが多い。上記第１の推定値Ｅ_kiは、補正対象信号Ｘ_ｋ（ｆ）および基準信号Ｘ_ｒｅｆ（ｆ）の各々のｉ番目のフレームの情報のみに基づいて、両信号のサンプリング周波数のズレを大まかに推定した値だからである。図２は人工的にサンプリング周波数を３Ｈｚずらして実験した時のフレーム毎のサンプリング周波数推定結果である。なお、基準信号のサンプリング周波数に対する補正対象信号のサンプリング周波数の誤差が判れば、基準信号のサンプリング周波数と当該誤差から補正対象信号のサンプリング周波数を算出できるのであるから、上記誤差の推定と補正対象信号のサンプリング周波数の推定は等価である。図２に示すように、フレーム毎に推定したサンプリング周波数に大きなバラツキがあるのは、サンプリング周波数ズレに起因するごく小さな時間ズレを高い精度で推定することが難しいためであると考えられる。 The first statistical filter processing unit 130a _removes, from the first estimated value E _ki calculated for each frame by the error calculating unit 120, an outlier that is statistically estimated to include a large number of errors. Perform filter processing. The first estimated value E _ki calculated for each frame by the error calculator 120 often includes many errors. The first estimated value E _ki is based on only information on the i-th frame of each of the correction target signal X _k (f) and the reference signal X _ref (f), and roughly determines the difference between the sampling frequencies of both signals. This is because it is an estimated value. FIG. 2 shows a sampling frequency estimation result for each frame when an experiment was performed by artificially shifting the sampling frequency by 3 Hz. If the error of the sampling frequency of the correction target signal with respect to the sampling frequency of the reference signal is known, the sampling frequency of the correction target signal can be calculated from the sampling frequency of the reference signal and the error. The estimation of the sampling frequency is equivalent. As shown in FIG. 2, it is considered that the reason why there is a large variation in the sampling frequency estimated for each frame is that it is difficult to estimate a very small time shift caused by the sampling frequency shift with high accuracy.

本実施形態では、誤差算出部１２０により算出された第１の推定値Ｅ_kiのうち、他のものと比較して大きく外れるような値を外れ値として除去する処理が第１の統計フィルタ処理として採用されている。具体的には、本実施形態では、第１の統計フィルタ処理として所謂四分位数に基づく処理が採用されている。ここで、四分位数とは、処理対象のデータを大きさの順にソートした後に、それらを四等分する区切りの数のことを言い、小さい方から第１四分位数、第２四分位数、第３四分位数と呼ばれる（図３参照）。また、第１四分位数と第２四分位数の差は、四分位数範囲（Interquartile range; ＩＱＲ）と呼ばれる。四分位数範囲はサンプルのバラつき具合を表す１つの指標である。 In the present embodiment, a process of removing, as an outlier, a value that greatly deviates from other first estimated values E _ki calculated by the error calculating unit 120 as an outlier is used as a first statistical filter process. Has been adopted. Specifically, in the present embodiment, a process based on a so-called quartile is employed as the first statistical filter process. Here, the quartile refers to the number of divisions into which the data to be processed is sorted into quartiles after they are sorted in the order of size, and the first quartile, the second quartile, and the The quantile is called the third quartile (see FIG. 3). The difference between the first quartile and the second quartile is called an interquartile range (IQR). The quartile range is one indicator of the degree of sample variation.

より詳細に説明すると、第１の統計フィルタ処理部１３０ａは、まず、誤差算出部１２０によりフレーム毎に算出された第１の推定値Ｅ_kiをその大きさ順にソートする。次いで、第１の統計フィルタ処理部１３０ａは、誤差算出部１２０によりフレーム毎に算出された第１の推定値Ｅ_kiのうち、上記ソート結果における第１四分位数より小さい値、もしくは第３四分位数より大きい値を外れ値として除外し、その残り（すなわち、外れ値を含まない一群の第１の推定値Ｅ´_ki）を第２の統計フィルタ処理部１３０ｂに引き渡す。ここで、外れ値を検出する演算ｏ（）は以下の数５により表される。具体的には、数５におけるｅ（ｎ）に誤差算出部１２０により算出された第１の推定値Ｅ_kiの各々を代入し、演算ｏ（）の値が１であれば当該第１の推定値Ｅ_kiを外れ値として除外するといった具合である。ｑ_Ｌおよびｑ_Ｈはそれぞれ第１四分位数および第３四分位数を表す。

More specifically, the first statistical filter processing unit 130a first sorts the first estimated values E _ki calculated for each frame by the error calculating unit 120 in the order of their sizes. Next, the first statistical filter processing unit 130a determines, among the first estimated values E _ki calculated for each frame by the error calculating unit 120, a value smaller than the first quartile in the sorting result, or A value larger than the quartile is excluded as an outlier, and the rest (that is, a group of first estimated values E ′ _ki not including the outlier) is transferred to the second statistical filter processing unit 130b. Here, the operation o () for detecting an outlier is represented by the following Expression 5. Specifically, each of the first estimated values E _ki calculated by the error calculating unit 120 is substituted for e (n) in Expression 5, and if the value of the operation o () is 1, the first estimated value For example, the value E _ki is excluded as an outlier. q _L and q _H represent the first and third quartiles, respectively.

本実施形態では、第１の統計フィルタ処理として四分位数に基づく処理が採用されていたが、四分位数に加えて、四分位数範囲を使った処理であっても良い。具体的には、外れ値であるか否かを識別するための演算として数５に示す演算ｏ（）に代えて数６に示す演算ｏ（）を行えば良い。数６に示す演算は第１、３四分位数にＩＱＲの値を重み付きで加算あるいは減算することを意味している。α＝０とすると、数６は数５と一致する。α＝１．５で計算する方法が広く知られており、例えば図４に示す箱ひげ図の上下のひげに当たる部分はこれで計算されている。

In the present embodiment, a process based on a quartile is used as the first statistical filter process, but a process using a quartile range in addition to the quartile may be used. Specifically, the operation o () shown in Expression 6 may be performed instead of the operation o () shown in Expression 5 as an operation for identifying whether or not the value is an outlier. The calculation shown in Equation 6 means adding or subtracting the IQR value to the first and third quartiles with weight. Assuming α = 0, Equation 6 matches Equation 5. A method of calculating at α = 1.5 is widely known. For example, a portion corresponding to the upper and lower whiskers in the box plot shown in FIG. 4 is calculated by this.

第２の統計フィルタ処理部１３０ｂは、第１の統計フィルタ処理部１３０ａより外れ値を除外された一群の第１の推定値Ｅ´_kiから代表値を選択する第２の統計フィルタ処理（具体的には、中央値を選択するフィルタ処理）を実行し、その処理結果を第２の推定値Ｅ_ｋとして時間軸圧伸装置２０に与える。なお、上記代表値としては最大値や最小値などを用いることが考えられるが、中央値を用いることが最も好ましいと考えられる。また、第２の統計フィルタ処理部１３０ｂの実行する第２の統計フィルタ処理として、第１の統計フィルタ処理部１３０ａより外れ値を除外された一群の第１の推定値Ｅ´_kiを平滑化するフィルタ処理（第１の統計フィルタ処理部１３０ａより外れ値を除外された一群の第１の推定値Ｅ´_kiの平均値を算出する処理）を採用しても良いが、本出願人の行った実験によれば、上記中央値を選択するフィルタ処理を採用した方が良好な結果が得られた。このため、本実施形態では、中央値を選択するフィルタ処理が採用されている。
以上がサンプリング周波数推定装置１０の構成である。 The second statistical filter processing unit 130b selects a representative value from a group of first estimated values E ′ _ki from which outliers have been excluded by the first statistical filter processing unit 130a. the performs filtering) for selecting the median value, gives the time scale modification apparatus 20 the processing result as a second estimate E _k. Note that a maximum value, a minimum value, or the like may be used as the representative value, but a median value is considered to be most preferable. In addition, as a second statistical filtering process performed by the second statistical filtering unit 130b, a group of first estimated values E ′ _ki excluding outliers from the first statistical filtering unit 130a are smoothed. Filter processing (processing of calculating an average value of a group of first estimated values E ′ _ki excluding outliers from the first statistical filter processing unit 130 a) may be employed, but the present applicant has performed the processing. According to the experiment, better results were obtained when the filter processing for selecting the median was employed. For this reason, in the present embodiment, filter processing for selecting a median value is employed.
The above is the configuration of the sampling frequency estimation device 10.

（Ｂ：実施形態の効果）
本実施形態によれば、Ｎ個の音信号のうちの１つを基準信号、残りのＮ−１個の音信号の各々を補正対象信号とし、基準信号のサンプリング周波数に対する補正対象信号のサンプリング周波数の誤差が補正対象信号毎にサンプリング周波数推定装置１０によって推定され、その誤差が解消されるように時間軸圧伸を補正対象信号に施すことで、Ｎ個の音信号の同期が実現される。本出願人は本実施形態の効果を評価するために、非特許文献２に開示の技術を対比の対象として、サンプリング周波数の誤差の推定性能および計算速度（サンプリング周波数の誤差の推定値の算出を完了するまでに要した計算時間の時間長）の観点から評価実験を行った。この評価実験の概要は以下の通りである。 (B: Effect of the embodiment)
According to the present embodiment, one of the N sound signals is set as a reference signal, and each of the remaining N-1 sound signals is set as a correction target signal, and the sampling frequency of the correction target signal with respect to the sampling frequency of the reference signal is set. Is estimated by the sampling frequency estimating apparatus 10 for each correction target signal, and time axis companding is performed on the correction target signal so as to eliminate the error, thereby realizing synchronization of the N sound signals. In order to evaluate the effect of the present embodiment, the present applicant compares the technique disclosed in Non-Patent Document 2 with the estimation performance of the sampling frequency error and the calculation speed (calculation of the estimated value of the sampling frequency error). An evaluation experiment was performed from the viewpoint of the length of calculation time required to complete). The outline of this evaluation experiment is as follows.

まず、４４．１ｋＨｚのサンプリング周波数でサンプリングされた１６ビットの１０曲の市販曲（ジャンルはポップス、各曲の時間長は１０秒）の音信号を原信号とし、この原信号そのままを基準信号、この原信号に人工的にリサンプリング（±５Ｈｚ）を施した信号を補正対象信号とし、各補正対象信号のサンプリング周波数の誤差を本実施形態のサンプリング周波数推定装置１０および非特許文献２に開示の技術により推定した。なお、本評価実験では、ＣＰＵとして３．４ＧＨｚ駆動のＣｏｒｅｉ７３７７０を有し、３２ＧＢのＲＡＭを有する計算機をサンプリング周波数推定装置１０として用い、ＳＴＦＴ部１００等の各部の実装にはＭＡＴＬＡＢ（登録商標）を用いた。ＭＡＴＬＡＢ（登録商標）とは、米国The MathWorks社の数値解析ソフトウェアである。同様に、非特許文献２に開示の手法もＣ／Ｃ＋＋およびＭＡＴＬＡＢ（登録商標）で同計算機に実装して実行した。また、ＦＦＴ長は４０９６サンプルであり、解析窓として窓長が４０９６サンプルのＨａｍｍｉｎｇ窓を用い、さらに、相互相関関数Ｃ（τ）を算出する際のシフトサイズ（すなわち、τの更新単位）として８１９２、４０９６，２０４８および１０２４サンプルを用い、使用するデータ範囲を（３／８）×Ｔ〜（５／８）×Ｔ（Ｔはデータ数）とした。 First, a sound signal of 10 commercially available 16-bit songs (genre is pops, each song has a time length of 10 seconds) sampled at a sampling frequency of 44.1 kHz is used as an original signal, and the original signal is used as a reference signal. A signal obtained by artificially performing resampling (± 5 Hz) on the original signal is used as a correction target signal, and an error in the sampling frequency of each correction target signal is disclosed in the sampling frequency estimating apparatus 10 of this embodiment and Non-Patent Document 2. Estimated by technology. In this evaluation experiment, a computer having Corei7 3770 driven by 3.4 GHz as a CPU and a computer having a RAM of 32 GB was used as the sampling frequency estimating apparatus 10, and MATLAB (registered trademark) was used for mounting each unit such as the STFT unit 100. Was used. MATLAB (registered trademark) is numerical analysis software from The MathWorks, USA. Similarly, the method disclosed in Non-Patent Document 2 was implemented in C / C ++ and MATLAB (registered trademark) on the same computer and executed. The FFT length is 4096 samples, a Hamming window having a window length of 4096 samples is used as an analysis window, and a shift size (ie, an update unit of τ) for calculating a cross-correlation function C (τ) is 8192. , 4096, 2048, and 1024 samples, and the data range to be used was (3/8) × T to (5/8) × T (T is the number of data).

図５（ａ）は、本実施形態についての推定性能に関する実験結果を示す図であり、図５（ｂ）は非特許文献２に開示の手法についての推定性能に関する実験結果を示す図である。図５（ａ）と図５（ｂ）を対比すれば明らかなように、最高性能では非特許文献２に開示の技術が上回っている（すなわち、推定誤差が小さい）。しかし、例えば２時間（７２００秒）録音し、補正後の補正対象信号の基準信号に対する時間ズレを５ミリ秒以下に押さえる（サンプリング周波数の推定誤差を０．０３Ｈｚ以内に抑える）といった実用的な範囲の性能は本実施形態でも達成されている。したがって、本実施形態でも実用的な範囲での問題は何ら発生しない。また、図５（ａ）からは、本実施形態ではシフトサイズによらず同程度の推定性能を実現できていることが判る。シフトサイズは計算量に影響する。つまり、図５（ａ）の実験結果は、本実施形態によれば計算量を少なくしても実用的な範囲の性能を十分に達成できることを意味している。 FIG. 5A is a diagram illustrating an experimental result regarding the estimation performance for the present embodiment, and FIG. 5B is a diagram illustrating an experimental result regarding the estimation performance for the method disclosed in Non-Patent Document 2. As is clear from a comparison between FIG. 5A and FIG. 5B, the technology disclosed in Non-Patent Document 2 exceeds the technology at the highest performance (that is, the estimation error is small). However, for example, a practical range in which recording is performed for 2 hours (7200 seconds) and the time lag between the corrected signal to be corrected and the reference signal is suppressed to 5 ms or less (the estimation error of the sampling frequency is suppressed to within 0.03 Hz). Is also achieved in the present embodiment. Therefore, even in the present embodiment, no problem occurs in a practical range. Also, from FIG. 5A, it can be seen that in the present embodiment, the same estimation performance can be realized regardless of the shift size. The shift size affects the amount of calculation. In other words, the experimental results in FIG. 5A indicate that the present embodiment can sufficiently achieve a practical range of performance even with a small amount of calculation.

図６（ａ）は、本実施形態についての計算速度に関する実験結果を示す図であり、図６（ｂ）は非特許文献２に開示の手法についての計算速度に関する実験結果を示す図である。る。図６（ａ）と図６（ｂ）を対比すれば明らかなように、非特許文献２に開示の手法に比較して本実施形態の手法は圧倒的に高速（サンプリング周波数ズレの推定を完了するまでに要した計算時間が短い）であり、ＭＡＴＬＡＢ（登録商標）による実装でも、Ｃ／Ｃ＋＋による非特許文献２に開示の手法の実装を凌ぐ計算速度が得られていることが判る。以上の実験結果を総括すると、本実施形態によれば、非特許文献２に開示の技術に比較して短い計算時間で実用的な範囲の推定性能を達成することができる、と結論付けられる。 FIG. 6A is a diagram illustrating an experimental result regarding the calculation speed for the present embodiment, and FIG. 6B is a diagram illustrating an experimental result regarding the calculation speed for the method disclosed in Non-Patent Document 2. You. As is clear from comparison between FIGS. 6A and 6B, the method of the present embodiment is overwhelmingly faster (the estimation of the sampling frequency shift is completed) as compared with the method disclosed in Non-Patent Document 2. The calculation time required to perform the method is short), and it can be seen that even with the implementation using MATLAB (registered trademark), a calculation speed that exceeds the implementation of the method disclosed in Non-Patent Document 2 using C / C ++ is obtained. Summarizing the above experimental results, it is concluded that according to the present embodiment, it is possible to achieve a practical range of estimation performance in a shorter calculation time than the technique disclosed in Non-Patent Document 2.

以上説明したように本実施形態によれば、同じ音波形を各々別個にサンプリングして得られる複数の音信号の同期を、非特許文献２に開示の技術に比較して短い計算時間で実現することが可能になる。加えて、本実施形態では、サンプリングされた音信号（換言すれば、録音された音信号）だけで同期が可能であり、パイロット信号を必要とはしないので非特許文献１に開示の技術に比較して高い汎用性を有する。また、本実施形態では、同期対象の各音信号は繰り返し送出されたものである必要はなく、特許文献１に開示の技術に比較して高い汎用性を有する。つまり、本実施形態によれば、同じ波形を別個独立にサンプリングして得られる複数の信号の同期を従来よりも短い計算時間で実現することが可能になり、かつ高い汎用性を実現することが可能になる。 As described above, according to the present embodiment, synchronization of a plurality of sound signals obtained by separately sampling the same sound waveform can be realized in a shorter calculation time than the technique disclosed in Non-Patent Document 2. It becomes possible. In addition, in the present embodiment, synchronization can be performed only with a sampled sound signal (in other words, a recorded sound signal), and a pilot signal is not required. It has high versatility. In the present embodiment, each sound signal to be synchronized does not need to be repeatedly transmitted, and has higher versatility than the technology disclosed in Patent Document 1. That is, according to the present embodiment, it is possible to achieve synchronization of a plurality of signals obtained by independently sampling the same waveform in a shorter calculation time than before, and to realize high versatility. Will be possible.

（Ｃ：変形）
以上本発明の一実施形態について説明したが、この実施形態に以下の変形を加えても勿論良い。
（１）上記実施形態では、サンプリング周波数推定装置１０に入力される複数の信号が、同じ音波形を各々別個独立にサンプリングして得られた複数の音信号である場合について説明した。しかし、サンプリング周波数推定装置１０に入力される複数の信号は、同じ波形を各々別個独立にサンプリングして得られたものであれば良く、音信号に限定される訳ではない。また、上記実施形態では、第１の統計フィルタ処理として四分位数を利用した処理を採用したが、例えば、誤差算出部１２０によりフレーム毎に算出された推定値をその大きさ順にソートした後にそれらを三等分に区切り、小さい方の区切り位置に対応する値よりも小さいもの、および大きい方の区切り位置に対応する値よりも大きいものを外れ値とする処理を採用しても良い。要は、誤差算出部１２０によりフレーム毎に算出された第１の推定値をその大きさ順にソートした場合における両端から所定個数分、または両端から所定個数番目の各値に応じて定まる範囲に属さないもの、を外れ値とする処理であれば良い。 (C: deformation)
Although one embodiment of the present invention has been described above, the following modifications may be made to this embodiment.
(1) In the above embodiment, a case has been described where a plurality of signals input to the sampling frequency estimating apparatus 10 are a plurality of sound signals obtained by independently sampling the same sound waveform. However, the plurality of signals input to the sampling frequency estimating apparatus 10 need only be obtained by independently sampling the same waveform, and are not limited to sound signals. Further, in the above embodiment, the process using the quartile is adopted as the first statistical filter process. For example, after the estimated values calculated for each frame by the error calculating unit 120 are sorted in the order of their sizes, A process may be adopted in which these are divided into three equal parts, and a value smaller than the value corresponding to the smaller break position and a value larger than the value corresponding to the larger break position are set as outliers. The point is that when the first estimated values calculated for each frame by the error calculating unit 120 are sorted in the order of their sizes, the first estimated values belong to a predetermined number from both ends or a range determined according to each predetermined number-th value from both ends. Any processing may be used as long as the processing is performed for outliers to be outliers.

（２）上記実施形態の統計処理部１３０が実行する統計処理は、決定論的アプローチによる処理であり、四分位数法等を利用して外れ値を除外する第１の統計フィルタ処理と、第１の統計フィルタ処理の処理結果から代表値（上記実施形態では、中央値）を選択し、当該値を補正対象信号全体に亘るサンプリング周波数の誤差の推定値とする第２の統計フィルタ処理により構成さていた。しかし、フレーム毎に算出された第１の推定値を指数関数族で統計的にモデル化し、モデルパラメータを推定することにより、第２の推定値を算出する処理を上記統計処理として採用しても良い。具体的には、例えばラプラス分布で上記モデル化を行い、ラプラス分布のパラメータを推定することで分布の形状を決定し、決定された分布から最頻値を求め、その最頻値を上記第２の推定値とすることで、サンプリング周波数の誤差の尤も確からしい値を推定するといった具合である。 (2) The statistical processing executed by the statistical processing unit 130 of the above embodiment is a processing based on a deterministic approach, and includes a first statistical filter processing for removing outliers using a quartile method or the like, A representative value (median value in the above embodiment) is selected from the processing result of the first statistical filtering process, and the selected value is used as an estimated value of the error of the sampling frequency over the entire signal to be corrected. Was composed. However, even if the process of calculating the second estimated value by statistically modeling the first estimated value calculated for each frame by an exponential function family and estimating the model parameters is adopted as the above-described statistical process, good. Specifically, for example, the above-described modeling is performed using the Laplace distribution, the shape of the distribution is determined by estimating the parameters of the Laplace distribution, the mode is obtained from the determined distribution, and the mode is calculated as the second mode. By estimating the estimated value, the value of the sampling frequency error is also likely to be estimated.

（３）上記実施形態の時間ズレ量算出部１１０は、相互相関関数Ｃ（τ）が最大となるτのみに基づいて時間ズレ量を算出したが、相互相関関数Ｃ（τ）の大きい順に最大Ｍ個のτを候補として残し、これらＭ個のτに基づいて時間ズレ量を算出しても良い。例えばこれらＭ個のτの平均値から時間ズレ量を算出するといった具合である。また、相互相関関数Ｃ（τ）の値が所定の閾値以上となる全てのτを時間ズレ量の候補としてもよい。この場合、パワーの大小の影響を避けるため、正規化相互相関関数を用いるようにすれば良い。 (3) The time lag amount calculating unit 110 of the above embodiment calculates the time lag amount based only on τ at which the cross-correlation function C (τ) becomes the maximum, but the time lag amount is calculated in descending order of the cross-correlation function C (τ). M τ may be left as a candidate, and the time shift amount may be calculated based on the M τ. For example, the time shift amount is calculated from the average value of these M τs. In addition, all τs whose values of the cross-correlation function C (τ) are equal to or larger than a predetermined threshold may be set as candidates for the amount of time shift. In this case, in order to avoid the influence of the magnitude of the power, a normalized cross-correlation function may be used.

（４）時間ズレ量算出部１１０において、基準信号および補正対象信号のパワーが所定の閾値未満であるフレームを時間ズレ量の算出対象から除外するようにしても良い。上記閾値を適切な値に定めておけば、基準信号を充分な強度で含んでいないフレームや補正対象信号を充分な強度で含んでいないフレームが時間ズレ量の算出対象から除外される。基準信号を充分な強度で含んでいないフレームや補正対象信号を充分な強度で含んでいないフレームは、そもそもサンプリング周波数ズレの推定に対する寄与は小さく、このようなフレームについて時間ズレ量を算出したとしてもエラーを多く含むものとなってしまう。このような時間ズレ量に基づいて算出された第１の推定値は外れ値として第１の統計フィルタ処理部１３０ａによって外れ値として除外される可能性が高く、そもそも時間ズレ量の算出自体が無駄になってしまう。このような態様によれば、時間ズレ量算出部１１０において無駄な演算が行われることを回避しつつ、サンプリング周波数の誤差を高い精度で算出することが可能になる。 (4) The time lag amount calculating unit 110 may exclude frames whose powers of the reference signal and the correction target signal are less than a predetermined threshold from calculation targets of the time lag amount. If the threshold is set to an appropriate value, frames that do not include the reference signal with sufficient strength or frames that do not include the correction target signal with sufficient strength are excluded from the calculation of the time shift amount. A frame that does not include the reference signal with sufficient strength or a frame that does not include the correction target signal with sufficient strength has a small contribution to the estimation of the sampling frequency shift in the first place, and even if the time shift amount is calculated for such a frame. It will contain many errors. The first estimated value calculated based on such a time shift amount is likely to be excluded as an outlier by the first statistical filter processing unit 130a as an outlier, and the calculation itself of the time shift amount is useless in the first place. Become. According to such an embodiment, it is possible to calculate the error of the sampling frequency with high accuracy while avoiding the useless calculation in the time shift amount calculation unit 110.

（５）時間ズレ量算出部１１０において、相互相関関数Ｃ（τ）の最大値が所定の閾値を下回るフレームを時間ズレ量の算出対象から除外するようにしても良い。上記閾値を適切な値に定めておけば、当該閾値を下回る相互相関関数に基づいて時間ズレ量の算出が行われることはない。上記閾値を下回る相互相関関数に基づいて時間ズレ量を算出したとしてもエラーを多く含むものとなってしまい、このような時間ズレ量に基づいて算出され第１の推定値は外れ値として除外される可能性が高く、そもそも時間ズレ量の算出自体が無駄になってしまう。このような態様によっても、時間ズレ量算出部において無駄な演算が行われることを回避しつつ、サンプリング周波数を高い精度で算出することが可能になる。 (5) The time lag amount calculating unit 110 may exclude a frame in which the maximum value of the cross-correlation function C (τ) is smaller than a predetermined threshold from the calculation target of the time lag amount. If the threshold is set to an appropriate value, the calculation of the amount of time lag will not be performed based on the cross-correlation function below the threshold. Even if the time lag amount is calculated based on the cross-correlation function below the threshold value, the time lag amount will include many errors, and the first estimated value calculated based on such a time lag amount is excluded as an outlier. It is highly probable that the calculation of the time lag amount itself is useless in the first place. According to such an embodiment as well, it is possible to calculate the sampling frequency with high accuracy while avoiding useless calculation in the time lag amount calculating unit.

（６）上記実施形態では、基準信号および補正対象信号をフレームに区切る際のフレームサイズを固定としたが、このような態様ではフレーム番号が大きくなるほど両信号のサンプルズレが大きくなり、相互相関関数Ｃ（τ）を計算が無意味になる（或いは、相互相関関数Ｃ（τ）を計算できなくなる）ことが考えられる。そこで、相互相関関数Ｃ（τ）の最大値が所定の閾値を下回った時点で処理を停止し、サンプリング周波数推定装置１０の利用者に何らかの報知を行っても良く、また、フレームサイズを大きくして基準信号および補正対象信号のフレームを区切り直すようにしても良い。 (6) In the above embodiment, the frame size when dividing the reference signal and the correction target signal into frames is fixed. However, in such a mode, as the frame number increases, the sample shift between the two signals increases, and the cross-correlation function It is conceivable that the calculation of C (τ) becomes meaningless (or the cross-correlation function C (τ) cannot be calculated). Therefore, the processing may be stopped when the maximum value of the cross-correlation function C (τ) falls below a predetermined threshold, and some notification may be given to the user of the sampling frequency estimating apparatus 10, or the frame size may be increased. The frame of the reference signal and the frame of the signal to be corrected may be re-divided.

（７）上記実施形態では、基準信号と補正対象信号の相互相関関数をフレーム毎に算出することで、各フレームにおける両信号の時間ズレ量を算出した。しかし、両信号を正規化してから両信号の相互相関を算出し、その算出結果に基づいて時間ズレ量を算出しても勿論良い。また、基準信号と補正対象信号の一方を他方に対して時間軸方向にずらしつつ両信号の差信号を算出する処理をフレーム毎に実行するとともに当該差信号の振幅の最大値（或いはパワー）を両信号の相関を表す値として算出し、その算出結果に基づいて両信号のフレーム毎の時間ズレ量を算出しても良く、また両信号の和信号と差信号の比を算出することで両信号の相関を表す値をフレーム毎に算出しその算出結果に応じて両信号のフレーム毎の時間ズレ量を算出しても良い。また、パターンマッチングにより、基準信号と補正対象信号の相関を表す値をフレーム毎に算出し、その算出結果に応じて両信号のフレーム毎の時間ズレ量を算出しても良い。要は、基準信号と補正対象信号の相関をフレーム毎に算出し、その算出結果に応じて両信号のフレーム毎の時間ズレ量を算出する態様であれば良い。 (7) In the above embodiment, the amount of time lag between both signals in each frame is calculated by calculating the cross-correlation function between the reference signal and the signal to be corrected for each frame. However, it is of course also possible to calculate the cross-correlation between the two signals after normalizing the two signals and calculate the amount of time shift based on the calculation result. Further, a process of calculating a difference signal between the two signals while shifting one of the reference signal and the correction target signal in the time axis direction with respect to the other is executed for each frame, and the maximum value (or power) of the amplitude of the difference signal is determined. The signal may be calculated as a value representing the correlation between the two signals, and the time lag amount of each signal may be calculated for each frame based on the calculation result, or the ratio between the sum signal and the difference signal of the two signals may be calculated. A value representing the correlation between the signals may be calculated for each frame, and the amount of time lag between the two signals for each frame may be calculated based on the calculation result. Alternatively, a value indicating the correlation between the reference signal and the correction target signal may be calculated for each frame by pattern matching, and the amount of time lag between the two signals for each frame may be calculated based on the calculation result. The point is that the correlation between the reference signal and the signal to be corrected is calculated for each frame, and the time shift amount of each signal for each frame is calculated according to the calculation result.

１…信号処理システム、１０…サンプリング周波数推定装置、２０…時間軸圧伸装置、１００…ＳＴＦＴ部、１１０…時間ズレ量算出部、１２０…誤差算出部、１３０…統計処理部、１３０ａ…第１の統計フィルタ処理部、１３０ｂ…第２の統計フィルタ処理部。
DESCRIPTION OF SYMBOLS 1 ... Signal processing system, 10 ... Sampling frequency estimation apparatus, 20 ... Time axis companding apparatus, 100 ... STFT section, 110 ... Time deviation amount calculation section, 120 ... Error calculation section, 130 ... Statistical processing section, 130a ... First , A statistical filter processing unit, 130b... A second statistical filter processing unit.

Claims

An error calculating step of calculating, for each frame, an error of a sampling frequency of a correction target signal being a second sound signal with respect to a sampling frequency of a reference signal being a first sound signal;
A statistical processing step of performing statistical processing on the error calculated for each frame in the error calculating step and estimating a sampling frequency error over the entire correction target signal;
A synchronization step of performing time-axis companding on the correction target signal so that the error estimated in the statistical processing step is eliminated and synchronizing the correction target signal with the reference signal,
The synchronization method, wherein the reference signal and the correction target signal are sound signals obtained by separately and independently sampling the same sound waveform.

Time shift calculation for calculating the correlation of both signals for each frame while shifting one of the reference signal and the correction target signal in the time axis direction, and calculating the time shift of both signals for each frame according to the calculation result. Including steps
In the error calculation step, an error of the sampling frequency of the correction target signal in each frame is calculated for each frame from the time shift amount calculated in the time shift amount calculation step.
The method according to claim 1, wherein:

Error calculating means for calculating, for each frame, an error of a sampling frequency of a correction target signal which is a second sound signal with respect to a sampling frequency of a reference signal which is a first sound signal;
Statistical processing means for performing statistical processing on the error calculated for each frame by the error calculating means and estimating a sampling frequency error over the entire correction target signal;
Synchronizing means for performing time-axis companding on the correction target signal so as to eliminate the error estimated by the statistical processing means and synchronizing the correction target signal with the reference signal,
A synchronizer, wherein the reference signal and the correction target signal are sound signals obtained by separately and independently sampling the same sound waveform .

Time shift calculation for calculating the correlation of both signals for each frame while shifting one of the reference signal and the correction target signal in the time axis direction, and calculating the time shift of both signals for each frame according to the calculation result. Including means,
The error calculating means calculates, for each frame, an error of the sampling frequency of the correction target signal in each frame from the time lag amount calculated by the time lag amount calculating means.
The synchronizer according to claim 3, wherein: