JP2007110451A

JP2007110451A - Speech signal adjustment apparatus, speech signal adjustment method, and program

Info

Publication number: JP2007110451A
Application number: JP2005299357A
Authority: JP
Inventors: Yasushi Sato; 寧佐藤
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2005-10-13
Filing date: 2005-10-13
Publication date: 2007-04-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech signal adjustment apparatus or the like for simply adjusting sound quality at a high speed or accurately. <P>SOLUTION: A sub-band data input section 1 acquires a sub-band data group representing a temporal change in the strength of a fundamental frequency component or harmonics of speech, and on the other hand, a calibration data generating section 3 receives the speech signals reproduced by an speech signal reproducing section 5 to generate a calibration sub-band data group representing the speech signals. A speech quality adjustment section 4 adjusts the strength of each sub-band data in the sub-band data group on the basis of the calibration sub-band data group or according to speech quality designation data when a speech quality designation data input section acquires the speech quality designation data. The speech signal reproducing section 5 reproduces the speech represented by the sub-band data group after the adjustment. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、音声信号調整装置、音声信号調整方法及びプログラムに関する。 The present invention relates to an audio signal adjustment device, an audio signal adjustment method, and a program.

音声データを用いて音声を再生する場合、一般に、音声データが表す本来の音声と実際に再生される音声との差異を補正する、再生される音声から雑音を除去する、あるいは本来の音声に聴覚上の特殊効果を付加する、等の目的で、再生される音声の音質の調整が行われる。 When playing back sound using sound data, in general, the difference between the original sound represented by the sound data and the actually reproduced sound is corrected, noise is removed from the reproduced sound, or the original sound is audibly heard. The sound quality of the reproduced sound is adjusted for the purpose of adding the above special effect or the like.

音質の調整は、従来は、イコライザを備えた音声再生装置にテスト用の音声データを用いて音声を再生させ、再生された音声を受信して、受信された音声の波形とテスト用の音声データが表す波形との差異に基づいてイコライザの周波数特性を決定し、決定した通りの周波数特性となるようにイコライザを操作することによって行っていた（例えば、特許文献１参照）。なお、テスト用の音声データとしては、例えばインパルス波形やスイープ波形を表すものが用いられていた。
特開２００１−１９７５８５号公報 Conventionally, the sound quality is adjusted by causing a sound reproduction device equipped with an equalizer to reproduce sound using test sound data, receiving the reproduced sound, and receiving the received sound waveform and test sound data. The frequency characteristic of the equalizer is determined based on the difference from the waveform represented by, and the equalizer is operated so as to obtain the determined frequency characteristic (see, for example, Patent Document 1). As the test voice data, for example, data representing an impulse waveform or a sweep waveform has been used.
JP 2001-197585 A

しかし、従来のイコライザは構成が複雑であり製造コストが大きかった。また、決定した通りになるよう正確に周波数特性を変化させるイコライザは構成が複雑となり、これを製造することは技術的にも経済的にも困難であった。また、テスト用の音声としてインパルス波形を使うと、再生される音声の帯域が極めて広くなるため、その周波数特性を正確に特定しにくく、従ってイコライザの周波数特性の決定結果が不適切になりやすい。また、テスト用の音声としてスイープ波形を使うと、再生される音声の周波数特性の特定に長時間が必要になる。 However, the conventional equalizer has a complicated configuration and a high manufacturing cost. In addition, an equalizer that changes the frequency characteristic accurately so as to be determined has a complicated structure, and it is difficult to manufacture this equalizer both technically and economically. In addition, when an impulse waveform is used as a test sound, the reproduced sound band becomes extremely wide, so that it is difficult to accurately specify the frequency characteristic, and therefore, the determination result of the frequency characteristic of the equalizer tends to be inappropriate. Further, when a sweep waveform is used as a test sound, it takes a long time to specify the frequency characteristics of the reproduced sound.

この発明は、上記実状に鑑みてなされたものであり、簡単、高速あるいは正確に音質の調整を行うための音声信号調整装置、音声信号調整方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide an audio signal adjustment device, an audio signal adjustment method, and a program for adjusting sound quality simply, at high speed or accurately.

上記目的を達成するため、この発明の第１の観点に係る音声信号調整装置は、
音声の基本周波数成分又は高調波成分の強度の時間変化を表す音声信号からなる音声信号群を外部より取得する音声信号取得手段と、
前記音声信号取得手段が取得した音声信号群に含まれる音声信号の強度を変更する音声信号調整手段と、
音声信号の強度を変更された音声信号群に基づき、当該音声信号群が表す音声の波形を表す信号を生成する波形生成手段と、を備える、
ことを特徴とする。 In order to achieve the above object, an audio signal adjustment device according to a first aspect of the present invention is provided.
Audio signal acquisition means for acquiring an audio signal group consisting of audio signals representing temporal changes in the intensity of the fundamental frequency component or harmonic component of audio from the outside;
An audio signal adjustment unit that changes the intensity of an audio signal included in the audio signal group acquired by the audio signal acquisition unit;
Waveform generating means for generating a signal representing a waveform of a voice represented by the voice signal group based on the voice signal group whose intensity of the voice signal is changed,
It is characterized by that.

前記音声信号調整装置は、前記音声信号取得手段が取得した音声信号群に含まれる音声信号の強度の変更の態様を指定する指定データを外部より取得する指定データ取得手段を更に備えてもよい。
この場合、前記音声信号調整手段は、前記音声信号取得手段が取得した音声信号群に含まれる音声信号の強度を、前記指定データ取得手段が取得した指定データが指定する態様で変更するものであってもよい。 The audio signal adjustment device may further include designation data acquisition means for acquiring from the outside specification data for specifying a mode of changing the intensity of the audio signal included in the audio signal group acquired by the audio signal acquisition means.
In this case, the audio signal adjusting unit changes the intensity of the audio signal included in the audio signal group acquired by the audio signal acquiring unit in a manner specified by the specified data acquired by the specified data acquiring unit. May be.

前記音声信号調整装置は、音声を受音し、当該音声の波形を表す加工対象の音声信号の基本周波数成分及び高調波成分の強度の時間変化を表す校正用音声信号からなる校正用音声信号群を生成する校正用音声信号生成手段を更に備えてもよい。
この場合、前記音声信号調整手段は、前記音声信号取得手段が取得した音声信号群に含まれる音声信号の強度の変更後の値を、当該音声信号の強度、及び、当該音声信号と実質的に同一の周波数の成分を表す前記校正用音声信号の強度に基づいて決定し、決定結果に従って当該音声信号の強度を変更するものであってもよい。 The audio signal adjusting device receives a sound, and includes a calibration audio signal group including a calibration audio signal representing a temporal change in intensity of a fundamental frequency component and a harmonic component of an audio signal to be processed that represents a waveform of the audio. May further comprise a calibration audio signal generating means for generating
In this case, the audio signal adjustment unit substantially changes the intensity of the audio signal included in the audio signal group acquired by the audio signal acquisition unit to the intensity of the audio signal and the audio signal. It may be determined based on the intensity of the calibration audio signal representing the component of the same frequency, and the intensity of the audio signal may be changed according to the determination result.

前記校正用音声信号生成手段は、
受音した音声の波形を表す信号を生成し、当該信号の単位ピッチ分にあたる区間の時間長を実質的に同一に揃えることにより、当該信号をピッチ波形信号へと加工する手段と、
前記ピッチ波形信号の基本周波数成分及び高調波成分の強度の時間変化を表す信号を、前記校正用音声信号として生成する手段と、を備えていてもよい。 The calibration audio signal generation means includes:
Means for generating a signal representing the waveform of the received sound and processing the signal into a pitch waveform signal by making the time lengths of the sections corresponding to the unit pitch of the signal substantially the same;
And means for generating a signal representing a temporal change in intensity of the fundamental frequency component and the harmonic component of the pitch waveform signal as the calibration audio signal.

また、この発明の第２の観点に係る音声信号調整方法は、
音声の基本周波数成分又は高調波成分の強度の時間変化を表す音声信号からなる音声信号群を外部より取得し、
取得した音声信号群に含まれる音声信号の強度を変更し、
音声信号の強度を変更された音声信号群に基づき、当該音声信号群が表す音声の波形を表す信号を生成する、
ことを特徴とする。 An audio signal adjustment method according to the second aspect of the present invention is as follows.
Obtain an audio signal group consisting of audio signals representing temporal changes in the intensity of the fundamental frequency component or harmonic component of the audio from the outside,
Change the intensity of the audio signal included in the acquired audio signal group,
Based on the audio signal group in which the intensity of the audio signal is changed, a signal representing the waveform of the audio represented by the audio signal group is generated.
It is characterized by that.

また、この発明の第３の観点に係るプログラムは、
コンピュータを、
音声の基本周波数成分又は高調波成分の強度の時間変化を表す音声信号からなる音声信号群を外部より取得する音声信号取得手段と、
前記音声信号取得手段が取得した音声信号群に含まれる音声信号の強度を変更する音声信号調整手段と、
音声信号の強度を変更された音声信号群に基づき、当該音声信号群が表す音声の波形を表す信号を生成する波形生成手段と、
して機能させるためのものであることを特徴とする。 A program according to the third aspect of the present invention is:
Computer
Audio signal acquisition means for acquiring an audio signal group consisting of audio signals representing temporal changes in the intensity of the fundamental frequency component or harmonic component of audio from the outside;
An audio signal adjustment unit that changes the intensity of an audio signal included in the audio signal group acquired by the audio signal acquisition unit;
A waveform generating means for generating a signal representing a waveform of a voice represented by the voice signal group based on the voice signal group in which the intensity of the voice signal is changed;
It is for making it function.

この発明によれば、簡単、高速あるいは正確に音質の調整を行うための音声信号調整装置、音声信号調整方法及びプログラムが実現される。 According to the present invention, an audio signal adjustment device, an audio signal adjustment method, and a program for adjusting sound quality simply, quickly, or accurately are realized.

以下、この発明の実施の形態を、音質調整装置を例とし、図面を参照して説明する。
図１は、この音質調整装置の構成を示す図である。図示するように、この音質調整装置は、サブバンドデータ入力部１と、音質指定データ入力部２と、校正用データ生成部３と、音質調整部４と、音声再生部５とより構成されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings, taking a sound quality adjusting device as an example.
FIG. 1 is a diagram showing a configuration of the sound quality adjusting device. As shown in the figure, this sound quality adjusting device is composed of a subband data input unit 1, a sound quality designation data input unit 2, a calibration data generation unit 3, a sound quality adjustment unit 4, and an audio playback unit 5. Yes.

サブバンドデータ入力部１は、例えば、記録媒体（例えば、フレキシブルディスクやＭＯ（Magneto Optical disk）など）に記録されたデータを読み取る記録媒体ドライバ（フレキシブルディスクドライブや、ＭＯドライブなど）や、あるいは、ＵＳＢ（Universal Serial Bus）インターフェース回路等からなり外部とのデータ交換を制御する通信制御装置などからなっている。 The subband data input unit 1 is, for example, a recording medium driver (flexible disk drive, MO drive, etc.) for reading data recorded on a recording medium (for example, a flexible disk or an MO (Magneto Optical disk)), or It consists of a USB (Universal Serial Bus) interface circuit and the like, and a communication control device that controls data exchange with the outside.

サブバンドデータ入力部１は、音声を表すサブバンドデータ群を取得し、音質調整部４に供給する。サブバンドデータ群は、音声の基本周波数成分の強度の時間変化を表す０番目のサブバンドデータと、この音声のｎ個（ｎは自然数）の高調波成分の強度の時間変化を表す１番目〜ｎ番目までのｎ個のサブバンドデータとを含むデータである。それぞれのサブバンドデータは、音声の基本周波数成分（又は高調波成分）の強度の時間変化がないとき、基本周波数成分（又は高調波成分）の強度を、直流信号の形で表す。 The subband data input unit 1 acquires a subband data group representing the sound and supplies it to the sound quality adjustment unit 4. The subband data group includes the 0th subband data representing the time variation of the intensity of the fundamental frequency component of the sound and the first to the time variation of the intensity of n harmonic components (n is a natural number) of this sound. Data including n subband data up to the nth. Each subband data represents the intensity of the fundamental frequency component (or harmonic component) in the form of a DC signal when there is no temporal change in the intensity of the fundamental frequency component (or harmonic component) of the sound.

また、当該サブバンドデータ群が表す音声が、その単位ピッチ分にあたる各区間を移相することにより、各区間の位相を揃えられたものである場合、サブバンドデータ入力部１は、当該サブバンドデータ群が表す音声についてのピッチ情報を取得可能であれば、このピッチ情報も取得し、音声再生部５に供給する。ピッチ情報は、当該サブバンドデータ群が表す音声の各区間の長さ（ピッチ長）の本来の値を表す情報である。 In addition, when the sound represented by the subband data group is one in which the phase of each section is aligned by shifting each section corresponding to the unit pitch, the subband data input unit 1 If the pitch information about the sound represented by the data group can be acquired, this pitch information is also acquired and supplied to the sound reproducing unit 5. The pitch information is information representing the original value of the length (pitch length) of each section of the voice represented by the subband data group.

音質指定データ入力部２は、例えば、キーボード、ポインティングデバイス等の入力装置と、ＣＰＵ（Digital Signal Processor）等のプロセッサとより構成されている。音質指定データ入力部２は、音質指定データを入力する操作が操作者によってなされれば、この操作に従って、音質指定データを取得する。そして、取得した音質指定データを、音質調整部４に供給する。 The sound quality designation data input unit 2 includes, for example, an input device such as a keyboard and a pointing device, and a processor such as a CPU (Digital Signal Processor). The sound quality designation data input unit 2 acquires sound quality designation data according to this operation when an operation for inputting the sound quality designation data is performed by the operator. Then, the acquired sound quality designation data is supplied to the sound quality adjustment unit 4.

音質指定データは、サブバンドデータ入力部１が取得したサブバンドデータ群を構成するそれぞれのサブバンドデータの強度をどのように変更すべきかを指定するデータであり、たとえば、それぞれのサブバンドデータが表す成分の強度に乗じるべき係数を表すデータからなっている。 The sound quality designation data is data for designating how the intensity of each subband data constituting the subband data group acquired by the subband data input unit 1 should be changed. For example, each subband data is It consists of data representing a coefficient to be multiplied by the intensity of the component to be represented.

校正用データ生成部３は、校正用音声入力部３１と、ピッチ抽出部３２と、サブバンド解析部３３とより構成されている。 The calibration data generation unit 3 includes a calibration voice input unit 31, a pitch extraction unit 32, and a subband analysis unit 33.

校正用音声入力部３１は、マイクロフォンなどからなる受音装置や、ＡＦ（Audio Frequency）増幅器、サンプラー、Ａ／Ｄ（Analog-to-Digital）コンバータ及びＰＣＭエンコーダなどより構成されている。校正用音声入力部３１は、自己のマイクロフォンが受音した音声を表す音声信号を増幅し、サンプリングしてＡ／Ｄ変換した後、サンプリングされた音声信号を表す校正用音声データを生成して、ピッチ抽出部３２へと供給する。 The calibration audio input unit 31 includes a sound receiving device including a microphone, an AF (Audio Frequency) amplifier, a sampler, an A / D (Analog-to-Digital) converter, a PCM encoder, and the like. The calibration voice input unit 31 amplifies a voice signal representing voice received by its own microphone, performs sampling and A / D conversion, and then generates calibration voice data representing the sampled voice signal. This is supplied to the pitch extraction unit 32.

なお、校正用音声データは、たとえば、ＰＣＭ（Pulse Code Modulation）変調されたディジタル信号の形式を有していればよく、校正用音声入力部３１が受音した音声を、そのピッチより十分短い一定の周期でサンプリングした結果を表すものとなっていればよい。 Note that the calibration voice data only needs to have, for example, a PCM (Pulse Code Modulation) modulated digital signal format, and the voice received by the calibration voice input unit 31 is a constant sufficiently shorter than the pitch. It suffices to represent the result of sampling at a period of.

ピッチ抽出部３２及びサブバンド解析部３３は、いずれも、ＤＳＰ（Digital Signal Processor）やＣＰＵ等のプロセッサと、ＲＡＭ（Random Access Memory）等のメモリとより構成されている。なお、単一のプロセッサや単一のメモリがピッチ抽出部３２及びサブバンド解析部３３の一部又は全部の機能を行うようにしてもよい。また、音質指定データ入力部２の機能を行うプロセッサがピッチ抽出部３２及びサブバンド解析部３３の一部又は全部の機能を共通して行うようにしてもよい。 Each of the pitch extraction unit 32 and the subband analysis unit 33 includes a processor such as a DSP (Digital Signal Processor) or a CPU and a memory such as a RAM (Random Access Memory). A single processor or a single memory may perform a part or all of the functions of the pitch extraction unit 32 and the subband analysis unit 33. Further, a processor that performs the function of the sound quality designation data input unit 2 may perform a part or all of the functions of the pitch extraction unit 32 and the subband analysis unit 33 in common.

ピッチ抽出部３２は、機能的には、たとえば図２に示すように、ケプストラム解析部３２１と、自己相関解析部３２２と、重み計算部３２３と、ＢＰＦ（Band Pass Filter：バンドパスフィルタ）係数計算部３２４と、バンドパスフィルタ３２５と、ゼロクロス解析部３２６と、波形相関解析部３２７と、位相調整部３２８と、リサンプリング部３２９とより構成されている。 The function of the pitch extraction unit 32 is, for example, as shown in FIG. 2, a cepstrum analysis unit 321, an autocorrelation analysis unit 322, a weight calculation unit 323, and a BPF (Band Pass Filter) coefficient calculation. A unit 324, a band pass filter 325, a zero cross analysis unit 326, a waveform correlation analysis unit 327, a phase adjustment unit 328, and a resampling unit 329 are configured.

なお、単一のプロセッサや単一のメモリがケプストラム解析部３２１、自己相関解析部３２２、重み計算部３２３、ＢＰＦ（Band Pass Filter）係数計算部３２４、バンドパスフィルタ３２５、ゼロクロス解析部３２６、波形相関解析部３２７、位相調整部３２８及びリサンプリング部３２９の一部又は全部の機能を行うようにしてもよい。 A single processor or a single memory includes a cepstrum analysis unit 321, an autocorrelation analysis unit 322, a weight calculation unit 323, a BPF (Band Pass Filter) coefficient calculation unit 324, a band pass filter 325, a zero cross analysis unit 326, a waveform. A part or all of the functions of the correlation analysis unit 327, the phase adjustment unit 328, and the resampling unit 329 may be performed.

ケプストラム解析部３２１は、校正用音声入力部３１より供給される校正用音声データにケプストラム分析を施すことにより、この校正用音声データが表す音声の基本周波数及びフォルマント周波数を特定する。そして、特定した基本周波数を示すデータを生成して重み計算部３２３へと供給し、また、特定したフォルマント周波数を示すデータを生成してサブバンド解析部３３へと供給する。 The cepstrum analysis unit 321 performs cepstrum analysis on the calibration voice data supplied from the calibration voice input unit 31 to identify the fundamental frequency and formant frequency of the voice represented by the calibration voice data. Then, data indicating the specified fundamental frequency is generated and supplied to the weight calculation unit 323, and data indicating the specified formant frequency is generated and supplied to the subband analysis unit 33.

具体的には、ケプストラム解析部３２１は、校正用音声入力部３１より校正用音声データを供給されると、まず、この校正用音声データのスペクトルを、高速フーリエ変換の手法（あるいは、離散的変数をフーリエ変換した結果を表すデータを生成する他の任意の手法）により求める。 Specifically, when the cepstrum analysis unit 321 is supplied with the calibration voice data from the calibration voice input unit 31, first, the spectrum of the calibration voice data is converted to a fast Fourier transform technique (or a discrete variable). Is obtained by any other method for generating data representing the result of Fourier transform.

次に、ケプストラム解析部３２１は、求められたスペクトルの各成分の強度を、それぞれの元の値の対数にあたる値へと変換する。（対数の底は任意であり、例えば常用対数などでよい。）
次に、ケプストラム解析部３２１は、値が変換されたスペクトルに逆フーリエ変換を施した結果（すなわち、ケプストラム）を、高速逆フーリエ変換の手法（あるいは、離散的変数を逆フーリエ変換した結果を表すデータを生成する他の任意の手法）により求める。 Next, the cepstrum analysis unit 321 converts the intensity of each component of the obtained spectrum into a value corresponding to the logarithm of each original value. (The base of the logarithm is arbitrary, and may be a common logarithm, for example.)
Next, the cepstrum analysis unit 321 represents the result of performing inverse Fourier transform on the spectrum whose value has been converted (that is, the cepstrum) and the fast inverse Fourier transform method (or the result of inverse Fourier transform of a discrete variable). Any other method for generating data).

そして、ケプストラム解析部３２１は、得られたケプストラムに基づいて、このケプストラムが表す音声の基本周波数を特定し、特定した基本周波数を示すデータを生成して重み計算部３２３へと供給する。
具体的には、ケプストラム解析部３２１は、例えば、得られたケプストラムをフィルタリング（すなわちリフタリング）することにより、このケプストラムのうち所定のケフレンシ以上の周波数成分（長成分）を抽出し、抽出された長成分ピークの位置に基づいて基本周波数を特定すればよい。 Then, based on the obtained cepstrum, the cepstrum analysis unit 321 identifies the fundamental frequency of the voice represented by the cepstrum, generates data indicating the identified fundamental frequency, and supplies the data to the weight calculation unit 323.
Specifically, the cepstrum analysis unit 321 extracts a frequency component (long component) equal to or higher than a predetermined quefrency from the cepstrum by filtering (ie, lifting) the obtained cepstrum, for example. The fundamental frequency may be specified based on the position of the component peak.

自己相関解析部３２２は、校正用音声入力部３１より校正用音声データを供給されると、校正用音声データの波形の自己相関関数に基づく解析を行うことにより、この校正用音声データが表す音声の基本周波数を特定し、特定した基本周波数を示すデータを生成して重み計算部３２３へと供給する。 When the autocorrelation analysis unit 322 is supplied with the calibration audio data from the calibration audio input unit 31, the autocorrelation analysis unit 322 performs an analysis based on the autocorrelation function of the waveform of the calibration audio data, thereby expressing the audio represented by the calibration audio data. The basic frequency is specified, data indicating the specified basic frequency is generated and supplied to the weight calculation unit 323.

具体的には、自己相関解析部３２２は、校正用音声入力部３１より校正用音声データを供給されるとまず、数式１の右辺により表される自己相関関数ｒ（ｌ）を特定する。 Specifically, when the autocorrelation analysis unit 322 is supplied with the calibration audio data from the calibration audio input unit 31, first, the autocorrelation function r (l) specified by the right side of Equation 1 is specified.

次に、自己相関解析部３２２は、自己相関関数ｒ（ｌ）をフーリエ変換した結果得られる関数（ピリオドグラム）の極大値を与える周波数のうち、所定の下限値を超える最小の値を基本周波数として特定し、特定した基本周波数を示すデータを生成して重み計算部３２３へと供給する。 Next, the autocorrelation analysis unit 322 calculates a minimum value exceeding a predetermined lower limit value as a fundamental frequency among frequencies giving a maximum value of a function (periodogram) obtained as a result of Fourier transform of the autocorrelation function r (l). Is generated, and data indicating the specified fundamental frequency is generated and supplied to the weight calculation unit 323.

重み計算部３２３は、ケプストラム解析部３２１及び自己相関解析部３２２より基本周波数を示すデータを１個ずつ合計２個供給されると、これら２個のデータが示す基本周波数の逆数の絶対値の平均を求める。そして、求めた値（すなわち、平均ピッチ長）を示すデータを生成し、ＢＰＦ係数計算部３２４へと供給する。 When the weight calculation unit 323 is supplied with a total of two pieces of data indicating the fundamental frequency one by one from the cepstrum analysis unit 321 and the autocorrelation analysis unit 322, the average of the absolute values of the reciprocals of the fundamental frequencies indicated by these two data items Ask for. Then, data indicating the obtained value (that is, average pitch length) is generated and supplied to the BPF coefficient calculation unit 324.

ＢＰＦ係数計算部３２４は、平均ピッチ長を示すデータを重み計算部３２３より供給され、ゼロクロス解析部３２６より後述のゼロクロス信号を供給されると、供給されたデータやゼロクロス信号に基づき、平均ピッチ長とピッチ信号とゼロクロスの周期とが互いに所定量以上異なっているか否かを判別する。そして、異なっていないと判別したときは、ゼロクロスの周期の逆数を中心周波数（バンドパスフィルタ３２５の通過帯域の中央の周波数）とするように、バンドパスフィルタ３２５の周波数特性を制御する。一方、所定量以上異なっていると判別したときは、平均ピッチ長の逆数を中心周波数とするように、バンドパスフィルタ３２５の周波数特性を制御する。 When the BPF coefficient calculation unit 324 is supplied with data indicating the average pitch length from the weight calculation unit 323 and is supplied with a zero cross signal described later from the zero cross analysis unit 326, the average pitch length is based on the supplied data and the zero cross signal. It is determined whether or not the pitch signal and the zero-crossing period differ from each other by a predetermined amount or more. When it is determined that they are not different, the frequency characteristic of the bandpass filter 325 is controlled so that the reciprocal of the zero-crossing period is the center frequency (the center frequency of the passband of the bandpass filter 325). On the other hand, when it is determined that they are different by a predetermined amount or more, the frequency characteristic of the bandpass filter 325 is controlled so that the reciprocal of the average pitch length is set as the center frequency.

バンドパスフィルタ３２５は、中心周波数が可変なＦＩＲ（Finite Impulse Response）型のフィルタの機能を行う。
具体的には、バンドパスフィルタ３２５は、自己の中心周波数を、ＢＰＦ係数計算部３２４の制御に従った値に設定する。そして、校正用音声入力部３１より供給される校正用音声データをフィルタリングして、フィルタリングされた校正用音声データ（ピッチ信号）を、ゼロクロス解析部３２６及び波形相関解析部３２７へと供給する。ピッチ信号は、校正用音声データのサンプルリング間隔と実質的に同一のサンプリング間隔を有するディジタル形式のデータからなるものとする。なお、バンドパスフィルタ３２５の帯域幅は、バンドパスフィルタ３２５の通過帯域の上限が校正用音声データの表す音声の基本周波数の２倍以内に常に収まるような帯域幅であることが望ましい。 The band pass filter 325 performs a function of a FIR (Finite Impulse Response) type filter having a variable center frequency.
Specifically, the bandpass filter 325 sets its center frequency to a value according to the control of the BPF coefficient calculation unit 324. Then, the calibration voice data supplied from the calibration voice input unit 31 is filtered, and the filtered calibration voice data (pitch signal) is supplied to the zero cross analysis unit 326 and the waveform correlation analysis unit 327. The pitch signal is assumed to be digital data having a sampling interval substantially the same as the sampling interval of the calibration audio data. Note that the bandwidth of the bandpass filter 325 is desirably such that the upper limit of the passband of the bandpass filter 325 always falls within twice the fundamental frequency of the voice represented by the calibration voice data.

ゼロクロス解析部３２６は、バンドパスフィルタ３２５から供給されたピッチ信号の瞬時値が０となる時刻（ゼロクロスする時刻）が来るタイミングを特定し、特定したタイミングを表す信号（ゼロクロス信号）を、ＢＰＦ係数計算部３２４へと供給する。
ただし、ゼロクロス解析部３２６は、ピッチ信号の瞬時値が０でない所定の値となる時刻が来るタイミングを特定し、特定したタイミングを表す信号を、ゼロクロス信号に代えてＢＰＦ係数計算部３２４へと供給するようにしてもよい。 The zero-cross analysis unit 326 specifies the timing when the time when the instantaneous value of the pitch signal supplied from the band-pass filter 325 becomes 0 (time when zero-crossing) comes, and the signal representing the specified timing (zero-cross signal) is represented by the BPF coefficient. It supplies to the calculation part 324.
However, the zero cross analysis unit 326 specifies the timing at which the instantaneous value of the pitch signal becomes a predetermined value other than 0, and supplies a signal representing the specified timing to the BPF coefficient calculation unit 324 instead of the zero cross signal. You may make it do.

波形相関解析部３２７は、校正用音声入力部３１より校正用音声データを供給されると、バンドパスフィルタ３２５より供給されたピッチ信号の単位周期（例えば１周期）の境界が来るタイミングでこの校正用音声データを区切る。そして、区切られてできる区間のそれぞれについて、この区間内の校正用音声データの位相を種々変化させたものとこの区間内のピッチ信号との相関を求め、最も相関が高くなるときの校正用音声データの位相を、この区間内の校正用音声データの位相として特定する。 When the waveform correlation analysis unit 327 is supplied with the calibration voice data from the calibration voice input unit 31, the calibration is performed at the timing when the boundary of the unit period (for example, one cycle) of the pitch signal supplied from the band pass filter 325 comes. Separate audio data. Then, for each of the divided sections, the correlation between the variously changed calibration audio data phases in this section and the pitch signal in this section is obtained, and the calibration voice when the correlation becomes the highest The phase of the data is specified as the phase of the calibration audio data in this interval.

具体的には、波形相関解析部３２７は、それぞれの区間毎に、例えば、数式２の右辺により表される値ｃｏｒを、位相を表すφ（ただし、φは０以上の整数）の値を種々変化させた場合それぞれについて求める。そして、波形相関解析部３２７は、値ｃｏｒが最大になるようなφの値Ψを特定し、値Ψを示すデータを生成して、この区間内の校正用音声データの位相を表す位相データとして位相調整部３２８に供給する。 Specifically, the waveform correlation analysis unit 327, for each section, for example, varies the value cor represented by the right side of Equation 2 with various values of φ (where φ is an integer of 0 or more) representing the phase. Each change is obtained for each change. Then, the waveform correlation analysis unit 327 specifies the value ψ of φ that maximizes the value cor, generates data indicating the value ψ, and uses it as phase data representing the phase of the calibration audio data in this interval. This is supplied to the phase adjustment unit 328.

なお、区間の時間的な長さは、１ピッチ分程度であることが望ましい。区間が長いほど、区間内のサンプル数が増えて校正用ピッチ波形データ（後述）のデータ量が増大し、あるいは、サンプリング間隔が増大して校正用ピッチ波形データが表す音声が不正確になる、という問題が生じる。 Note that the time length of the section is preferably about one pitch. The longer the interval, the greater the number of samples in the interval and the amount of calibration pitch waveform data (described later) increases, or the sampling interval increases and the voice represented by the calibration pitch waveform data becomes inaccurate. The problem arises.

位相調整部３２８は、校正用音声入力部３１より校正用音声データを供給され、波形相関解析部３２７より校正用音声データの各区間の位相Ψを示すデータを供給されると、それぞれの区間の校正用音声データの位相を（−Ψ）だけ移相することにより、各区間の位相を揃える。そして、移相された校正用音声データ（校正用ピッチ波形データ）をリサンプリング部３２９に供給する。 When the phase adjustment unit 328 is supplied with the calibration voice data from the calibration voice input unit 31 and the waveform correlation analysis unit 327 is supplied with the data indicating the phase Ψ of each section of the calibration voice data, By shifting the phase of the calibration audio data by (−Ψ), the phases of the respective sections are made uniform. Then, the phase-shifted calibration audio data (calibration pitch waveform data) is supplied to the resampling unit 329.

リサンプリング部３２９は、位相調整部３２８より供給された校正用音声データの各区間をサンプリングし直し（リサンプリングし）、リサンプリングされた校正用ピッチ波形データを、サブバンド解析部３３に供給する。 The resampling unit 329 resamples (resamples) each section of the calibration audio data supplied from the phase adjustment unit 328, and supplies the resampled calibration pitch waveform data to the subband analysis unit 33. .

ただし、リサンプリング部３２９は、校正用音声データの各区間のサンプル数が互いにほぼ等しい一定数になるようにして、同一区間内では等間隔になるようリサンプリングする。サンプル数がこの一定数に満たない区間については、時間軸上で隣接するサンプル間を所定の手法（例えば、ラグランジェ補間）により補間するような値を有するサンプルを追加することにより、この区間のサンプル数をこの一定数に揃える。 However, the resampling unit 329 performs resampling so that the number of samples in each section of the calibration audio data is a constant number that is substantially equal to each other, and is equally spaced within the same section. For a section where the number of samples is less than this fixed number, by adding a sample having a value that interpolates between adjacent samples on the time axis by a predetermined method (for example, Lagrangian interpolation), Align the number of samples to this fixed number.

サブバンド解析部３３は、リサンプリング部３２９より供給された校正用ピッチ波形データにＤＣＴ（Discrete Cosine Transform）等の直交変換を施すことにより、校正用サブバンドデータ群を生成して、生成した校正用サブバンドデータ群を、音質調整部４へと供給する。
校正用サブバンドデータ群は、サブバンド解析部３３に供給された校正用ピッチ波形データが表す音声の基本周波数成分の強度の時間変化を表す０番目の校正用サブバンドデータと、この音声のｎ個（ｎは上述の自然数）の高調波成分の強度の時間変化を表す１番目〜ｎ番目までのｎ個の校正用サブバンドデータとを含むデータである。 The subband analysis unit 33 generates a calibration subband data group by performing orthogonal transformation such as DCT (Discrete Cosine Transform) on the calibration pitch waveform data supplied from the resampling unit 329, and generates the generated calibration. The subband data group is supplied to the sound quality adjustment unit 4.
The calibration subband data group includes zeroth calibration subband data representing the temporal change in intensity of the fundamental frequency component of the voice represented by the calibration pitch waveform data supplied to the subband analysis unit 33, and n of this voice. This is data including the first to n-th calibration subband data representing the time change of the intensity of the harmonic components (n is the above-mentioned natural number).

音質調整部４は、サブバンドデータ群をサブバンドデータ入力部１より供給され、また、校正用サブバンドデータ群を校正用データ生成部３のサブバンド解析部３３より供給されると、サブバンドデータ群内のｋ番目（ｋは０以上ｎ以下の整数）のサブバンドデータの強度が数式３に示す値Ｙ（ｋ）になるように、サブバンドデータ群内の各サブバンドデータを変更する。そして、値を変更されたサブバンドデータ群をサブバンド合成部５１に供給する。 When the sound quality adjustment unit 4 is supplied with the subband data group from the subband data input unit 1 and the subband data group for calibration is supplied from the subband analysis unit 33 of the calibration data generation unit 3, Each subband data in the subband data group is changed so that the intensity of the kth subband data in the data group (k is an integer of 0 or more and n or less) becomes the value Y (k) shown in Equation 3. . Then, the subband data group whose value has been changed is supplied to the subband synthesis unit 51.

（数３）
Ｙ（ｋ）＝｛α・Ｘ（ｋ）｝^２／｛Ｒ（ｋ）｝
（ただし、Ｘ（ｋ）はサブバンドデータ群内のｋ番目のサブバンドデータの変更前の強度、Ｒ（ｋ）は校正用サブバンドデータ群内のｋ番目のサブバンドデータの強度、αは所定の比例係数） (Equation 3)
Y (k) = {α · X (k)} ² / {R (k)}
(Where X (k) is the intensity before the change of the kth subband data in the subband data group, R (k) is the intensity of the kth subband data in the calibration subband data group, and α is Predetermined proportionality factor)

音声再生部５が再生した音声を校正用データ生成部３が受音するようにした場合において、音質調整部４から音声再生部５へと供給されたサブバンドデータ群が表す音声が音声再生部５により再生されてから、当該音声を表す校正用サブバンドデータ群が生成され音質調整部４に供給されるまでの時間長が無視できる程度に短く、また、Ｒ（ｋ）がＹ（ｋ）に比例するとみなせる、という条件下では、Ｙ（ｋ）の値は実質的に、｛α・Ｘ（ｋ）｝に比例する値へと調整される。（ただし、音質指定データは音質指定データ入力部２より供給されてないものとする。） When the calibration data generating unit 3 receives the sound reproduced by the sound reproducing unit 5, the sound represented by the subband data group supplied from the sound quality adjusting unit 4 to the sound reproducing unit 5 is the sound reproducing unit. 5, the time length from when the subband data group for calibration representing the sound is generated and supplied to the sound quality adjustment unit 4 is short enough to be ignored, and R (k) is Y (k). The value of Y (k) is substantially adjusted to a value proportional to {α · X (k)} under the condition that it can be regarded as proportional to. (However, it is assumed that the sound quality designation data is not supplied from the sound quality designation data input unit 2)

ただし、音質調整部４は、音質指定データを音質指定データ入力部２より供給されている場合は、値を変更した後のサブバンドデータ群内の各サブバンドデータの強度を、音質指定データが指定する強度へと更に変更することにより、サブバンドデータ群が全体として表す音声の音質を調整する。たとえば、音質指定データが、当該サブバンドデータが表す成分の強度に乗じるべき係数を表すものであれば、当該成分の強度と当該係数との積が新たな強度となるように、当該サブバンドデータの強度を変更する。そして、強度を更に変更されることにより音質の調整を受けたサブバンドデータ群を、サブバンド合成部５１に供給する。 However, when the sound quality designation data is supplied from the sound quality designation data input section 2, the sound quality adjustment section 4 determines the intensity of each subband data in the subband data group after changing the value, By further changing the intensity to the designated intensity, the sound quality of the voice represented by the subband data group as a whole is adjusted. For example, if the sound quality designation data represents a coefficient to be multiplied by the intensity of the component represented by the subband data, the subband data is set so that the product of the intensity of the component and the coefficient becomes a new intensity. Change the strength of the. Then, a subband data group that has been subjected to sound quality adjustment by further changing the intensity is supplied to the subband synthesizing unit 51.

音声再生部５は、サブバンド合成部５１と、音声波形復元部５２と、音声出力部５３とより構成されている。
このうち、サブバンド合成部５１及び音声波形復元部５２は、いずれも、ＤＳＰやＣＰＵ等のプロセッサと、ＲＡＭ等のメモリとより構成されている。なお、単一のプロセッサや単一のメモリがサブバンド合成部５１及び音声波形復元部５２の一部又は全部の機能を行うようにしてもよい。また、音質指定データ入力部２、ピッチ抽出部３２及びサブバンド解析部３３の一部又は全部の機能を行うプロセッサが、サブバンド合成部５１及び音声波形復元部５２の一部又は全部の機能を共通して行うようにしてもよい。 The audio reproduction unit 5 includes a subband synthesis unit 51, an audio waveform restoration unit 52, and an audio output unit 53.
Of these, each of the subband synthesizing unit 51 and the speech waveform restoring unit 52 includes a processor such as a DSP or a CPU and a memory such as a RAM. A single processor or a single memory may perform a part or all of the functions of the subband synthesizing unit 51 and the speech waveform restoring unit 52. The processor that performs some or all of the functions of the sound quality designation data input unit 2, the pitch extraction unit 32, and the subband analysis unit 33 performs some or all of the functions of the subband synthesis unit 51 and the speech waveform restoration unit 52. You may make it carry out in common.

サブバンド合成部５１は、サブバンドデータ群を音質調整部４より供給されると、このサブバンドデータ群に変換を施すことにより、このサブバンドデータ群により各周波数成分の強度が表されるピッチ波形データ（すなわち、音声の単位ピッチ分にあたる各区間を移相することにより、各区間の位相を揃えられた音声データ）、又は各区間の位相を揃える処理を経ていない音声データを復元し、復元されたピッチ波形データ又は音声データを、音声波形復元部５２へと供給する。 When the subband data unit 51 is supplied with the subband data group from the sound quality adjustment unit 4, the subband data unit 51 converts the subband data group into a pitch that represents the intensity of each frequency component by the subband data group. Restore and restore waveform data (that is, audio data in which the phase of each section is aligned by shifting each section corresponding to the unit pitch of the audio) or audio data that has not undergone the process of aligning the phase of each section The pitch waveform data or voice data thus processed is supplied to the voice waveform restoration unit 52.

サブバンド合成部５１がサブバンドデータ群に施す変換は、サブバンドデータ入力部１が取得したサブバンドデータ群を生成するために音声データに施した変換に対して実質的に逆変換の関係にあるような変換である。従って、たとえばこのサブバンドデータ群がピッチ波形データにＤＣＴを施して生成されたものである場合、サブバンド合成部５１は、このサブバンドデータ群にＩＤＣＴ（Inverse DCT）を施すようにすればよい。 The conversion performed by the subband synthesizing unit 51 on the subband data group is substantially inversely related to the conversion performed on the audio data in order to generate the subband data group acquired by the subband data input unit 1. It is a certain conversion. Therefore, for example, when this subband data group is generated by applying DCT to pitch waveform data, the subband synthesizing unit 51 may apply IDCT (Inverse DCT) to this subband data group. .

音声波形復元部５２は、サブバンド合成部５１より供給されたデータがピッチ波形データであれば、当該ピッチ波形データの各区間の時間長を、サブバンドデータ入力部１より供給されるピッチ情報が示す時間長になるよう変更する。区間の時間長の変更は、たとえば区間内にあるサンプルの間隔及び／又はサンプル数を変更することにより行えばよい。そして、音声波形復元部５２は、各区間の時間長を変更されたピッチ波形データ（すなわち、復元された音声を表す音声データ）を音声出力部５３へと供給する。
一方、音声波形復元部５２は、サブバンド合成部５１より供給されたデータが、各区間の位相を揃える処理を経ていない音声データであれば、当該音声データを、復元された音声を表す音声データであるものとして音声出力部５３へと供給する。 If the data supplied from the subband synthesis unit 51 is pitch waveform data, the speech waveform restoration unit 52 uses the pitch information supplied from the subband data input unit 1 as the time length of each section of the pitch waveform data. Change to the indicated time length. The time length of the section may be changed by changing the interval and / or the number of samples in the section, for example. Then, the voice waveform restoration unit 52 supplies the pitch waveform data in which the time length of each section is changed (that is, voice data representing the restored voice) to the voice output unit 53.
On the other hand, if the data supplied from the subband synthesizing unit 51 is audio data that has not undergone the process of aligning the phases of the sections, the audio waveform restoring unit 52 converts the audio data into audio data representing the restored audio. To the audio output unit 53.

音声出力部５３は、たとえば、ＰＣＭデコーダの機能を行う制御回路と、Ｄ／Ａ（Digital-to-Analog）コンバータと、ＡＦ（Audio Frequency）増幅器と、スピーカ等とを備えている。
音声出力部５３は、音声波形復元部５２より、復元された音声を表す音声データを供給されると、これらの音声データを復調し、Ｄ／Ａ変換及び増幅を行い、得られたアナログ信号を用いてスピーカを駆動することにより、音声を再生する。 The audio output unit 53 includes, for example, a control circuit that functions as a PCM decoder, a D / A (Digital-to-Analog) converter, an AF (Audio Frequency) amplifier, a speaker, and the like.
When the voice output unit 53 is supplied with voice data representing the restored voice from the voice waveform restoration unit 52, the voice output unit 53 demodulates the voice data, performs D / A conversion and amplification, and outputs the obtained analog signal. The sound is reproduced by driving the speaker.

音声再生部５が再生した音声を校正用データ生成部３が受音する、という条件の下で以上説明した動作を行うことにより、この音質調整装置は、サブバンドデータが表す音声の音質を調整する。 By performing the operation described above under the condition that the calibration data generating unit 3 receives the sound reproduced by the sound reproducing unit 5, the sound quality adjusting device adjusts the sound quality of the sound represented by the subband data. To do.

音質の調整は、サブバンドデータが表す成分の強度を変更する形で行われるものであり、一方、サブバンドデータは、音声の基本周波数成分又は高調波成分の強度の時間変化が特に急峻でない限り直流信号とみなせるものであるから、この音質調整装置の構成は簡単なものとでき、容易に製造することができる。 The sound quality adjustment is performed by changing the intensity of the component represented by the subband data. On the other hand, the subband data is used unless the temporal change in the intensity of the fundamental frequency component or the harmonic component of the sound is particularly steep. Since it can be regarded as a DC signal, the configuration of the sound quality adjusting device can be simple and can be easily manufactured.

また、直流信号とみなせるサブバンドデータの強度を変更する処理は、有限次数のフィルタによるフィルタリングとは異なり、所望の特性を正確に得られるものであるから、音質の調整は正確に行われる。 In addition, the processing for changing the intensity of the subband data that can be regarded as a DC signal is different from filtering by a finite-order filter and can accurately obtain a desired characteristic, so that the sound quality is accurately adjusted.

また、この音質調整装置は、外部から取得する任意のサブバンドデータに基づいて自らが再生する音声をテスト用の音声として用いることができるので、所定のテスト用信号を用いて音質の調整を行うために時間を割く必要はなく、本来再生したい音声を再生させながら音質の調整を行わせることが可能である。 In addition, since the sound quality adjusting apparatus can use the sound reproduced by itself based on arbitrary subband data acquired from the outside as the test sound, the sound quality is adjusted using a predetermined test signal. Therefore, it is not necessary to spend time, and it is possible to adjust the sound quality while reproducing the sound originally desired to be reproduced.

なお、この音質調整装置の構成は上述のものに限られない。
たとえば、この音質調整装置は、必ずしも音質指定データ入力部２及び校正用データ生成部３の両方を備えていなくてもよい。なお、この音質調整装置が校正用データ生成部３を備えない場合（又は、校正用データ生成部３より校正用サブバンドデータが供給されない場合）、音質調整部４は、サブバンドデータ入力部１より供給されたサブバンドデータ群を、サブバンドデータの値を変更済みのサブバンドデータ群であるものとして扱えばよい。そして、当該サブバンドデータ群内の各サブバンドデータの強度を、音質指定データが指定する強度へと直ちに変更すればよい。 The configuration of the sound quality adjusting device is not limited to the above.
For example, the sound quality adjusting apparatus does not necessarily include both the sound quality designation data input unit 2 and the calibration data generation unit 3. When the sound quality adjusting device does not include the calibration data generating unit 3 (or when the calibration subband data is not supplied from the calibration data generating unit 3), the sound quality adjusting unit 4 includes the subband data input unit 1 The supplied subband data group may be treated as a subband data group whose subband data value has been changed. Then, the intensity of each subband data in the subband data group may be immediately changed to the intensity designated by the sound quality designation data.

また、サブバンドデータ入力部１は、電話回線、専用回線、衛星回線等の通信回線を介して外部よりサブバンドデータを取得するようにしてもよい。この場合、サブバンドデータ入力部１は、例えばモデム等からなる通信制御装置を備えていればよい。
同様に、音質指定データ入力部２が通信制御装置を備えていてもよく、通信回線を介して外部より音質指定データを取得するようにしてもよい。
なお、１個の記録媒体ドライブ装置や通信制御装置がサブバンドデータ入力部１及び音質指定データ入力部２の機能を兼ねて行ってもよい。 The subband data input unit 1 may acquire subband data from the outside via a communication line such as a telephone line, a dedicated line, a satellite line, or the like. In this case, the subband data input unit 1 only needs to include a communication control device such as a modem.
Similarly, the sound quality designation data input unit 2 may include a communication control device, and sound quality designation data may be acquired from the outside via a communication line.
Note that one recording medium drive device or communication control device may perform the functions of the subband data input unit 1 and the sound quality designation data input unit 2 together.

また、ピッチ抽出部３２は、ケプストラム解析部３２１（又は自己相関解析部３２２）を備えていなくてもよく、この場合、重み計算部３２３は、ケプストラム解析部３２１（又は自己相関解析部３２２）が求めた基本周波数の逆数をそのまま平均ピッチ長として扱うようにすればよい。
また、波形相関解析部３２７は、バンドパスフィルタ３２５から供給されたピッチ信号を、そのままゼロクロス信号としてケプストラム解析部３２１へと供給するようにしてもよい。 In addition, the pitch extraction unit 32 may not include the cepstrum analysis unit 321 (or autocorrelation analysis unit 322). In this case, the weight calculation unit 323 has the cepstrum analysis unit 321 (or autocorrelation analysis unit 322). The reciprocal of the obtained fundamental frequency may be handled as the average pitch length as it is.
Further, the waveform correlation analysis unit 327 may supply the pitch signal supplied from the band pass filter 325 to the cepstrum analysis unit 321 as it is as a zero cross signal.

また、音質調整部４は、サブバンドデータをフィルタリングしてその交流成分を実質的に除去することにより、当該サブバンドデータから雑音を除去するようにしてもよい。 Further, the sound quality adjustment unit 4 may remove noise from the subband data by filtering the subband data and substantially removing the AC component.

以上、この発明の実施の形態を説明したが、この発明にかかる音声信号調整装置は、専用のシステムによらず、通常のコンピュータシステムを用いて実現可能である。
例えば、マイクロホン、サンプリング回路、Ａ／Ｄ変換器、Ｄ／Ａ変換器、スピーカ等を備えたパーソナルコンピュータに上述のサブバンドデータ入力部１、音質指定データ入力部２、校正用データ生成部３、音質調整部４及び音声再生部５の動作を実行させるためのプログラムを格納した媒体（ＣＤ−ＲＯＭ、フレキシブルディスク等）から該プログラムをインストールすることにより、上述の処理を実行する音質調整装置を構成することができる。 Although the embodiment of the present invention has been described above, the audio signal adjusting apparatus according to the present invention can be realized using a normal computer system, not a dedicated system.
For example, a personal computer including a microphone, a sampling circuit, an A / D converter, a D / A converter, a speaker, etc. is connected to the above-described subband data input unit 1, sound quality designation data input unit 2, calibration data generation unit 3, A sound quality adjusting apparatus that performs the above-described processing is configured by installing the program from a medium (CD-ROM, flexible disk, or the like) that stores the program for executing the operations of the sound quality adjusting unit 4 and the sound reproducing unit 5 can do.

そして、このプログラムを実行する上述のパーソナルコンピュータが、図１の音声調整装置の動作に相当する処理として、図３〜図４に示す処理を行うものとする。図３〜図４は、このパーソナルコンピュータが実行する処理を示すフローチャートである。 The above-described personal computer that executes this program performs the processing shown in FIGS. 3 to 4 as processing corresponding to the operation of the sound adjustment device of FIG. 3 to 4 are flowcharts showing processing executed by the personal computer.

すなわち、まず、このパーソナルコンピュータが、外部より、上述のサブバンドデータ群を取得し（図３、ステップＳ１０１）、また、当該サブバンドデータ群が表す音声が、その単位ピッチ分にあたる各区間を移相することにより、各区間の位相を揃えられたものであって、当該サブバンドデータ群が表す音声についてのピッチ情報を取得可能であれば、このピッチ情報も取得する（ステップＳ１０１）。また、このパーソナルコンピュータは、音質指定データを入力する操作が操作者によってなされれば、この操作に従って、音質指定データを取得する（ステップＳ１０１）。 That is, first, the personal computer acquires the above-described subband data group from the outside (step S101 in FIG. 3), and the voice represented by the subband data group moves through each section corresponding to the unit pitch. If the phase information of the sections is aligned and the pitch information about the voice represented by the subband data group can be acquired, the pitch information is also acquired (step S101). In addition, if the operator performs an operation for inputting the sound quality designation data, the personal computer acquires the sound quality designation data according to this operation (step S101).

その一方で、このパーソナルコンピュータは、音声を受音してサンプリングし、Ａ／Ｄ変換することにより、デジタル形式の校正用音声データを生成する（ステップＳ１０２）。そしてこの校正用音声データをフィルタリングすることにより、フィルタリングされた校正用音声データ（ピッチ信号）を生成する（ステップＳ１０３）。 On the other hand, the personal computer receives and samples the voice, samples and performs A / D conversion, thereby generating digital-format calibration voice data (step S102). Then, the calibration voice data is filtered to generate filtered calibration voice data (pitch signal) (step S103).

なお、このパーソナルコンピュータは、ピッチ信号を生成するために行うフィルタリングの特性を、後述するピッチ長と、ピッチ信号の瞬時値が０となる時刻（ゼロクロスする時刻）とに基づくフィードバック処理を行うことにより決定する。 In addition, this personal computer performs feedback processing based on the pitch length described later and the time when the instantaneous value of the pitch signal becomes zero (the time when zero crossing is performed), as the characteristics of filtering performed to generate the pitch signal. decide.

すなわち、このパーソナルコンピュータは、受音して生成した音声データに、例えば、上述したケプストラム解析、あるいは、上述した自己相関関数に基づく解析を施すことにより、この音声データが表す音声の基本周波数を特定し、この基本周波数の逆数の絶対値（すなわち、ピッチ長）を求める（ステップＳ１０４）。（あるいは、このパーソナルコンピュータは、ケプストラム解析及び自己相関関数に基づく解析の両方を行うことにより基本周波数を２個特定し、これら２個の基本周波数の逆数の絶対値の平均をピッチ長として求めるようにしてもよい。） That is, the personal computer specifies the fundamental frequency of the voice represented by the voice data by performing, for example, the cepstrum analysis described above or the analysis based on the autocorrelation function described above on the voice data generated by receiving the sound. Then, the absolute value (that is, the pitch length) of the reciprocal of the fundamental frequency is obtained (step S104). (Alternatively, this personal computer specifies two fundamental frequencies by performing both cepstrum analysis and analysis based on an autocorrelation function, and calculates the average of the absolute values of the reciprocals of these two fundamental frequencies as the pitch length. It may be.)

一方、このパーソナルコンピュータは、ピッチ信号がゼロクロスする時刻が来るタイミングを特定する（ステップＳ１０５）。そして、このパーソナルコンピュータは、ピッチ長とピッチ信号のゼロクロスの周期とが互いに所定量以上異なっているか否かを判別し（ステップＳ１０６）、異なっていないと判別した場合は、ゼロクロスの周期の逆数を中心周波数とするようなバンドパスフィルタの特性で上述のフィルタリングを行うこととする（ステップＳ１０７）。一方、所定量以上異なっていると判別した場合は、ピッチ長の逆数を中心周波数とするようなバンドパスフィルタの特性で上述のフィルタリングを行うこととする（ステップＳ１０８）。 On the other hand, this personal computer specifies the timing when the time at which the pitch signal crosses zero (step S105). The personal computer determines whether the pitch length and the zero crossing period of the pitch signal are different from each other by a predetermined amount or more (step S106). If it is determined that they are not different, the reciprocal of the zero crossing period is set. It is assumed that the above-described filtering is performed with the characteristics of the bandpass filter that sets the center frequency (step S107). On the other hand, if it is determined that they differ by a predetermined amount or more, the above-described filtering is performed with the characteristics of the bandpass filter such that the reciprocal of the pitch length is the center frequency (step S108).

次に、このパーソナルコンピュータは、生成したピッチ信号の単位周期の境界が来るタイミング（具体的には、ピッチ信号がゼロクロスするタイミング）で、記録媒体から読み出した音声データを区切る（ステップＳ１０９）。そして、区切られてできる区間のそれぞれについて、この区間内の音声データの位相を種々変化させたものとこの区間内のピッチ信号との相関を求め、最も相関が高くなるときの音声データの位相を、この区間内の音声データの位相として特定する（図４、ステップＳ１１０）。そして、音声データのそれぞれの区間を、互いが実質的に同じ位相になるように移相することにより、校正用ピッチ波形データを生成する（ステップＳ１１１）。具体的には、このパーソナルコンピュータは、それぞれの区間についてステップＳ１１０で上述の値Ψを特定し、当該区間内の音声データを、ステップＳ１１１で（−Ψ）だけ移相する。 Next, the personal computer divides the audio data read from the recording medium at the timing when the unit period boundary of the generated pitch signal comes (specifically, the timing at which the pitch signal crosses zero) (step S109). Then, for each of the sections that can be divided, the correlation between the variously changed phases of the audio data in this section and the pitch signal in this section is obtained, and the phase of the audio data when the correlation becomes the highest is obtained. The phase of the audio data in this section is specified (FIG. 4, step S110). Then, the pitch waveform data for calibration is generated by shifting the respective sections of the audio data so as to have substantially the same phase (step S111). Specifically, the personal computer specifies the above-described value Ψ in step S110 for each section, and shifts the audio data in the section by (−Ψ) in step S111.

次に、このパーソナルコンピュータは、校正用ピッチ波形データの各区間をリサンプリングする（ステップＳ１１２）。なお、このパーソナルコンピュータは、ピッチ波形データの各区間のサンプル数が互いにほぼ等しくなるようにして、同一区間内では等間隔になるようリサンプリングするものとすればよい。サンプル数がこの一定数に満たない区間については、時間軸上で隣接するサンプル間を所定の手法により補間するような値を有するサンプルを追加することにより、この区間のサンプル数をこの一定数に揃えればよい。 Next, the personal computer resamples each section of the calibration pitch waveform data (step S112). This personal computer may be resampled so that the number of samples in each section of the pitch waveform data is substantially equal to each other, and is equally spaced within the same section. For the interval where the number of samples is less than this fixed number, the number of samples in this interval is set to this fixed number by adding a sample having a value that interpolates between adjacent samples on the time axis by a predetermined method. Just do it.

次に、このパーソナルコンピュータは、校正用ピッチ波形データに直交変換を施すことにより、校正用サブバンドデータ群を生成する（ステップＳ１１３）。そして、ステップＳ１０１で取得したサブバンドデータ群内のｋ番目のサブバンドデータの変更前の強度をＸ（ｋ）、ステップＳ１１３で生成された校正用サブバンドデータ群内のｋ番目のサブバンドデータの強度をＲ（ｋ）として、ステップＳ１０１で取得したサブバンドデータ群内のｋ番目のサブバンドデータの強度が数式３に示す上述の値Ｙ（ｋ）になるように、サブバンドデータ群内の各サブバンドデータを変更し（ステップＳ１１４）、ステップＳ１１６に処理を進める。なお、このパーソナルコンピュータは、校正用サブバンドデータ群をまだ作成していない状態では、ステップＳ１０１で取得したサブバンドデータ群を、ステップＳ１１４の処理を経たものとして扱えばよい。 Next, the personal computer generates a calibration subband data group by performing orthogonal transformation on the calibration pitch waveform data (step S113). Then, the intensity before the change of the kth subband data in the subband data group acquired in step S101 is X (k), and the kth subband data in the calibration subband data group generated in step S113. In the subband data group so that the intensity of the k-th subband data in the subband data group acquired in step S101 becomes the above-described value Y (k) shown in Equation 3. Are changed (step S114), and the process proceeds to step S116. Note that, in the state where the calibration subband data group has not yet been created, the personal computer may treat the subband data group acquired in step S101 as having undergone the process of step S114.

次にこのパーソナルコンピュータは、音質指定データも取得している場合、ステップＳ１１４で値を変更した後のサブバンドデータ群内の各サブバンドデータの強度を、音質指定データが指定する強度へと更に変更することにより、サブバンドデータ群が全体として表す音声の音質を調整して（ステップＳ１１５）、ステップＳ１１６に処理を進める。 Next, when the sound quality designation data is also acquired, this personal computer further converts the intensity of each subband data in the subband data group after changing the value in step S114 to the intensity designated by the sound quality designation data. By changing, the sound quality of the sound represented by the subband data group as a whole is adjusted (step S115), and the process proceeds to step S116.

ステップＳ１１６でこのパーソナルコンピュータは、ステップＳ１１４又はＳ１１５までの処理を経たサブバンドデータ群に変換を施すことにより、このサブバンドデータ群により各周波数成分の強度が表されるピッチ波形データ又は音声データを復元する。ステップＳ１１６でサブバンドデータ群に施す変換は、ステップＳ１０１で取得したサブバンドデータ群を生成するために音声データに施した変換に対して実質的に逆変換の関係にあるような変換であるものとする。 In step S116, the personal computer converts the subband data group that has undergone the processing up to step S114 or S115, thereby converting pitch waveform data or audio data in which the intensity of each frequency component is represented by the subband data group. Restore. The conversion applied to the subband data group in step S116 is a conversion that has a substantially inverse relationship to the conversion performed on the audio data to generate the subband data group acquired in step S101. And

次に、このパーソナルコンピュータは、ステップＳ１１６で生成されたデータがピッチ波形データであれば、当該ピッチ波形データの各区間の時間長を、ステップＳ１０１で取得したピッチ情報が示す時間長になるよう変更し（ステップＳ１１７）、処理をステップＳ１１８へ移す。一方、ステップＳ１１６で生成されたデータが、各区間の位相を揃える処理を経ていない音声データであれば、ステップＳ１１７の処理を省略して直ちにステップＳ１１８へと処理を移す。 Next, this personal computer changes the time length of each section of the pitch waveform data to be the time length indicated by the pitch information acquired in step S101 if the data generated in step S116 is pitch waveform data. (Step S117), and the process proceeds to Step S118. On the other hand, if the data generated in step S116 is audio data that has not undergone the process of aligning the phases of the sections, the process of step S117 is omitted, and the process immediately proceeds to step S118.

ステップＳ１１８でこのパーソナルコンピュータは、ステップＳ１１６又はステップＳ１１７までの処理により得られた音声データを復調し、Ｄ／Ａ変換及び増幅を行い、得られたアナログ信号を用いて音声を再生する。 In step S118, the personal computer demodulates the audio data obtained by the processing up to step S116 or step S117, performs D / A conversion and amplification, and reproduces the audio using the obtained analog signal.

なお、たとえば通信回線の掲示板（ＢＢＳ）にこのプログラムをアップロードし、これを通信回線を介して配信してもよく、また、このプログラムを表す信号により搬送波を変調し、得られた変調波を伝送し、この変調波を受信した装置が変調波を復調してこれらのプログラムを復元するようにしてもよい。
そして、このプログラムを起動し、ＯＳの制御下に、他のアプリケーションプログラムと同様に実行することにより、上述の処理を実行することができる。 For example, this program may be uploaded to a bulletin board (BBS) on a communication line and distributed via the communication line. Also, a carrier wave is modulated with a signal representing this program, and the obtained modulated wave is transmitted. The apparatus that receives the modulated wave may demodulate the modulated wave to restore these programs.
The above-described processing can be executed by starting this program and executing it under the control of the OS in the same manner as other application programs.

なお、ＯＳが処理の一部を分担する場合、あるいは、ＯＳが本願発明の１つの構成要素の一部を構成するような場合には、記録媒体には、その部分を除いたプログラムを格納してもよい。この場合も、この発明では、その記録媒体には、コンピュータが実行する各機能又はステップを実行するためのプログラムが格納されているものとする。 When the OS shares a part of the processing, or when the OS constitutes a part of one component of the present invention, a program excluding the part is stored in the recording medium. May be. Also in this case, in the present invention, it is assumed that the recording medium stores a program for executing each function or step executed by the computer.

この発明の実施の形態に係る音質調整装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound quality adjustment apparatus which concerns on embodiment of this invention. ピッチ抽出部の構成を示すブロック図である。It is a block diagram which shows the structure of a pitch extraction part. この発明の実施の形態に係る音声調整装置の機能を行うパーソナルコンピュータが実行する処理を示すフローチャートである。It is a flowchart which shows the process which the personal computer which performs the function of the audio | voice adjustment apparatus which concerns on embodiment of this invention performs. この発明の実施の形態に係る音声調整装置の機能を行うパーソナルコンピュータが実行する処理を示すフローチャートの続きである。It is a continuation of the flowchart which shows the process which the personal computer which performs the function of the audio | voice adjustment apparatus which concerns on embodiment of this invention performs.

Explanation of symbols

１サブバンドデータ入力部
２音質指定データ入力部
３校正用データ生成部
３１校正用音声入力部
３２ピッチ抽出部
３２１ケプストラム解析部
３２２自己相関解析部
３２３重み計算部
３２４ＢＰＦ係数計算部
３２５バンドパスフィルタ
３２６ゼロクロス解析部
３２７波形相関解析部
３２８位相調整部
３２９リサンプリング部
３３サブバンド解析部
４音質調整部
５音声再生部
５１サブバンド合成部
５２音声波形復元部
５３音声出力部 DESCRIPTION OF SYMBOLS 1 Subband data input part 2 Sound quality designation | designated data input part 3 Calibration data generation part 31 Calibration voice input part 32 Pitch extraction part 321 Cepstrum analysis part 322 Autocorrelation analysis part 323 Weight calculation part 324 BPF coefficient calculation part 325 Band pass filter 326 Zero cross analysis unit 327 Waveform correlation analysis unit 328 Phase adjustment unit 329 Resampling unit 33 Subband analysis unit 4 Sound quality adjustment unit 5 Audio reproduction unit 51 Subband synthesis unit 52 Audio waveform restoration unit 53 Audio output unit

Claims

Audio signal acquisition means for acquiring an audio signal group consisting of audio signals representing temporal changes in the intensity of the fundamental frequency component or harmonic component of audio from the outside;
An audio signal adjustment unit that changes the intensity of an audio signal included in the audio signal group acquired by the audio signal acquisition unit;
Waveform generating means for generating a signal representing a waveform of a voice represented by the voice signal group based on the voice signal group whose intensity of the voice signal is changed,
An audio signal adjustment device characterized by the above.

Further comprising designation data obtaining means for obtaining designation data for designating a mode of change in intensity of the audio signal included in the audio signal group obtained by the audio signal obtaining means from the outside;
The audio signal adjustment unit changes the intensity of the audio signal included in the audio signal group acquired by the audio signal acquisition unit in a manner specified by the specified data acquired by the specified data acquisition unit.
The audio signal adjusting apparatus according to claim 1, wherein

Calibration audio signal that receives audio and generates a calibration audio signal group consisting of calibration audio signals that represent temporal changes in the intensity of the fundamental frequency component and the harmonic component of the processing target audio signal that represents the waveform of the audio. Further comprising generating means,
The audio signal adjustment means uses the value after the change of the intensity of the audio signal included in the audio signal group acquired by the audio signal acquisition means as the intensity of the audio signal and the frequency substantially the same as the audio signal. Determining based on the intensity of the calibration audio signal representing the component of, and changing the intensity of the audio signal according to the determination result,
The audio signal adjustment device according to claim 1 or 2,

The calibration audio signal generation means includes:
Means for generating a signal representing the waveform of the received sound and processing the signal into a pitch waveform signal by making the time lengths of the sections corresponding to the unit pitch of the signal substantially the same;
Means for generating, as the calibration audio signal, a signal representing a temporal change in intensity of the fundamental frequency component and the harmonic component of the pitch waveform signal;
The audio signal adjusting device according to claim 3, wherein

Obtain an audio signal group consisting of audio signals representing temporal changes in the intensity of the fundamental frequency component or harmonic component of the audio from the outside,
Change the intensity of the audio signal included in the acquired audio signal group,
Based on the audio signal group in which the intensity of the audio signal is changed, a signal representing the waveform of the audio represented by the audio signal group is generated.
A method of adjusting an audio signal.

Computer
Audio signal acquisition means for acquiring an audio signal group consisting of audio signals representing temporal changes in the intensity of the fundamental frequency component or harmonic component of audio from the outside;
An audio signal adjustment unit that changes the intensity of an audio signal included in the audio signal group acquired by the audio signal acquisition unit;
A waveform generating means for generating a signal representing a waveform of a voice represented by the voice signal group based on the voice signal group in which the intensity of the voice signal is changed;
Program to make it function.