JP4645867B2

JP4645867B2 - DIGITAL SIGNAL PROCESSING METHOD, LEARNING METHOD, DEVICE THEREOF, AND PROGRAM STORAGE MEDIUM

Info

Publication number: JP4645867B2
Application number: JP2000238892A
Authority: JP
Inventors: 哲二郎近藤; 勉渡辺; 正明服部; 裕人木村
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-08-02
Filing date: 2000-08-02
Publication date: 2011-03-09
Anticipated expiration: 2020-08-02
Also published as: JP2002049383A

Abstract

PROBLEM TO BE SOLVED: To provide a digital signal processing method which further improves the waveform reproducibility of digital signals, a learning method and their devices, and a program storage medium. SOLUTION: The classes of digital audio signals D10 are classified based on the polarities of the signals D10. Then, the signals D10 are converted by a predicting system corresponding to the class being classified. Thus, conversion that is made further suitable to the characteristics of the signals D10 can be performed.

Description

【０００１】
【発明の属する技術分野】
本発明はディジタル信号処理方法、学習方法及びそれらの装置並びにプログラム格納媒体に関し、レートコンバータ又はＰＣＭ(Pulse Code Modulation) 復号装置等においてディジタル信号に対してデータの補間処理を行うディジタル信号処理方法、学習方法及びそれらの装置並びにプログラム格納媒体に適用して好適なものである。
【０００２】
【従来の技術】
従来、ディジタルオーディオ信号をディジタル／アナログコンバータに入力する前に、サンプリング周波数を元の値の数倍に変換するオーバサンプリング処理を行っている。これにより、ディジタル／アナログコンバータから出力されたディジタルオーディオ信号はアナログ・アンチ・エイリアス・フィルタの位相特性が可聴周波数高域で一定に保たれ、また、サンプリングに伴うディジタル系のイメージ雑音の影響が排除されるようになされている。
【０００３】
かかるオーバサンプリング処理では、通常、線形一次（直線）補間方式のディジタルフィルタが用いられている。このようなディジタルフィルタは、サンプリングレートが変わったりデータが欠落した場合等に、複数の既存データの平均値を求めて直線的な補間データを生成するものである。
【０００４】
【発明が解決しようとする課題】
ところが、オーバサンプリング処理後のディジタルオーディオ信号は、線形一次補間によって時間軸方向に対してデータ量が数倍に緻密になっているものの、オーバサンプリング処理後のディジタルオーディオ信号の周波数帯域は変換前とあまり変わらず、音質そのものは向上していない。さらに、補間されたデータは必ずしもＡ／Ｄ変換前のアナログオーディオ信号の波形に基づいて生成されたのではないため、波形再現性もほとんど向上していない。
【０００５】
また、サンプリング周波数の異なるディジタルオーディオ信号をダビングする場合において、サンプリング・レート・コンバータを用いて周波数を変換しているが、かかる場合でも線形一次ディジタルフィルタによって直線的なデータの補間しか行うことができず、音質や波形再現性を向上することが困難であった。さらに、ディジタルオーディオ信号のデータサンプルが欠落した場合において同様である。
【０００６】
本発明は以上の点を考慮してなされたもので、ディジタル信号の波形再現性を一段と向上し得るディジタル信号処理方法、学習方法及びそれらの装置並びにプログラム格納媒体を提案しようとするものである。
【０００７】
【課題を解決するための手段】
かかる課題を解決するため本発明においては、ゼロレベルを基準としてディジタルオーディオ信号の極性に基づいてディジタルオーディオ信号のクラスを分類し、当該分類されたクラスに対応した予測方式でディジタルオーディオ信号を変換するようにしたことにより、音素に応じてクラス分類することができるので、一段とディジタルオーディオ信号の特徴に適応した変換を行うことができる。
【０００８】
【発明の実施の形態】
以下図面について、本発明の一実施の形態を詳述する。
【０００９】
図１においてオーディオ信号処理装置１０は、ディジタルオーディオ信号（以下これをオーディオデータと呼ぶ）のサンプリングレートを上げたり、オーディオデータを補間する際に、真値に近いオーディオデータをクラス分類適用処理によって生成するようになされている。因みに、ディジタルオーディオ信号とは、人や動物が発する声を表す音声信号、楽器が発する楽音を表す楽音信号、及びその他の音を表す信号を意味するものである。
【００１０】
すなわち、オーディオ信号処理装置１０において、極性判別部１１は入力端子Ｔ_INから供給された図２に示す入力オーディオデータＤ１０を所定時間毎の領域（この実施の形態の場合、例えば６サンプル毎とする）に分割した後、当該分割された各時間領域の波形について、図２に示す極性判別方法によりその極性クラスを判別する。
【００１１】
すなわち図２において、分割された領域ＡＲ１（カレントデータＣ１）のように切り出されたすべてのタップが正である場合、この極性クラスを CLASS０とし、分割された領域ＡＲ２（カレントデータＣ２）のように切り出された領域内にゼロクロスが存在すると共にカレントデータ（Ｃ２）が正である場合、この極性クラスを CLASS１とし、分割された領域ＡＲ４（カレントデータＣ４）のように切り出された切り出された領域内にゼロクロスが存在すると共にカレントデータ（Ｃ４）が負である場合、この極性クラスを CLASS２とし、分割された領域ＡＲ３（カレントデータＣ３）のように切り出されたすべてのタップが負である場合、この極性クラスを CLASS３とする。
【００１２】
このように、オーディオデータＤ１０の極性に基づく極性クラスを設定することにより、オーディオデータＤ１０がゼロレベル近傍である場合の音素と大振幅部での音素とを正負両方の領域で一段と明確に区別することができる。
【００１３】
極性判別部１１は入力オーディオデータＤ１０のこのときのカレントデータに対応して求められた極性判別結果（ CLASS０、 CLASS１、 CLASS２又は CLASS３）を極性クラスデータＤ１１としてクラス分類部１４に供給する。
【００１４】
また、クラス分類部抽出部１２は入力端子Ｔ_INから供給された入力オーディオデータＤ１０を、極性判別出部１１の場合と同様の時間領域（この実施の形態の場合例えば６サンプル）に分割することによりクラス分類しようとするオーディオ波形データＤ１２を抽出し、これをクラス分類部１４に供給する。
【００１５】
クラス分類部１４は、クラス分類抽出部１２において切り出されたオーディオ波形データＤ１２について、当該オーディオ波形データＤ１２を圧縮して圧縮データパターンを生成するＡＤＲＣ(Adaptive Dynamic Range Coding) 回路部と、オーディオ波形データＤ１２の属するクラスコードを発生するクラスコード発生回路部とを有する。
【００１６】
ＡＤＲＣ回路部はオーディオ波形データＤ１２に対して、例えば８ビットから２ビットに圧縮するような演算を行うことによりパターン圧縮データを形成する。このＡＤＲＣ回路部は、適応的量子化を行うものであり、ここでは、信号レベルの局所的なパターンを短い語長で効率的に表現することができるので、信号パターンのクラス分類のコード発生用に用いられる。
【００１７】
具体的には、オーディオ波形上の６つの８ビットのデータ（オーディオ波形データ）をクラス分類しようとする場合、２⁴⁸という膨大な数のクラスに分類しなければならず、回路上の負担が多くなる。そこで、この実施の形態のクラス分類部１４ではその内部に設けられたＡＤＲＣ回路部で生成されるパターン圧縮データに基づいてクラス分類を行う。例えば６つのオーディオ波形データに対して１ビットの量子化を実行すると、６つのオーディオ波形データを６ビットで表すことができ、２⁶＝６４クラスに分類することができる。
【００１８】
ここで、ＡＤＲＣ回路部は、切り出された領域内のオーディオ波形のダイナミックレンジをＤＲ、ビット割り当をｍ、各オーディオ波形データのデータレベルをＬ、量子化コードをＱとすると、次式、
【００１９】
【数１】

【００２０】
に従って、領域内の最大値ＭＡＸと最小値ＭＩＮとの間を指定されたビット長で均等に分割して量子化を行う。なお、（１）式において｛｝は小数点以下の切り捨て処理を意味する。かくしてオーディオ波形上の６つの波形データが、それぞれ例えば８ビット（ｍ＝８）で構成されているとすると、これらはＡＤＲＣ回路部においてそれぞれが２ビットに圧縮される。
【００２１】
このようにしてダイナミックレンジで正規化され圧縮されたオーディオ波形データをそれぞれｑ_n（ｎ＝１〜６）とすると、クラス分類部１４に設けられたクラスコード発生回路部は、圧縮されたオーディオ波形データｑ_nに基づいて、次式、
【００２２】
【数２】

【００２３】
に示す演算を実行することにより、そのブロック（ｑ₁〜ｑ₆）が属するクラスを示すクラスコードclass を算出すると共に、当該算出されたオーディオ波形データＤ１２に基づくクラスコード classに対して、上述の極性クラス CLASSを統合した後、当該統合されたクラスコード class′を表すクラスコードデータＤ１４を予測係数メモリ１５に供給する。このクラスコードclass ′は、予測係数メモリ１５から予測係数を読み出す際の読み出しアドレスを示す。因みに（２）式において、ｎは圧縮されたオーディオ波形データｑ_nの数を表し、この実施の形態の場合ｎ＝６であり、またＰはビット割り当てを表し、この実施の形態の場合Ｐ＝２である。
【００２４】
このようにして、クラス分類部１４はクラス分類部抽出部１２において入力オーディオデータＤ１０から切り出されたオーディオ波形データＤ１２そのもののクラスコード classと、オーディオ波形データＤ１２の極性クラス CLASSとを統合したクラスコードデータ（ class′）Ｄ１４を生成し、これを予測係数メモリ１５に供給する。因みに、オーディオ波形データＤ１２そのもののクラスコード classと、オーディオ波形データＤ１２の極性クラス CLASSとを統合する方法として、クラス分類部１４は例えばオーディオ波形データＤ１２そのもののクラスコード classに極性クラス CLASSを付加することにより、これらを統合することができる。
【００２５】
予測係数メモリ１５には、各クラスコードに対応する予測係数のセットがクラスコードに対応するアドレスにそれぞれ記憶されており、クラス分類部１４から供給されるクラスコードデータＤ１４に基づいて、当該クラスコードに対応するアドレスに記憶されている予測係数のセットｗ₁〜ｗ_nが読み出され、予測演算部１６に供給される。
【００２６】
予測演算部１６は、予測演算部抽出部１３において入力オーディオデータＤ１０から時間軸領域で切り出された予測演算しようとするオーディオ波形データ（予測タップ）Ｄ１３（ｘ₁〜ｘ_n）と、予測係数ｗ₁〜ｗ_nに対して、次式
【００２７】
【数３】

【００２８】
に示す積和演算を行うことにより、予測結果ｙ′を得る。この予測値ｙ′が、音質が改善されたオーディオデータＤ１６として予測演算部１６から出力される。
【００２９】
なお、オーディオ信号処理装置１０の構成として図１について上述した機能ブロックを示したが、この機能ブロックを構成する具体的構成として、この実施の形態においては図３に示すコンピュータ構成の装置を用いる。すなわち、図３において、オーディオ信号処理装置１０は、バスＢＵＳを介してＣＰＵ２１、ＲＯＭ(Read Only Memory)２２、予測係数メモリ１５を構成するＲＡＭ(Random Access Memory)１５、及び各回路部がそれぞれ接続された構成を有し、ＣＰＵ１１はＲＯＭ２２に格納されている種々のプログラムを実行することにより、図１について上述した各機能ブロック（極性判別部１１、クラス分類部抽出部１２、予測演算部抽出部１３、クラス分類部１４及び予測演算部１６）として動作するようになされている。
【００３０】
また、オーディオ信号処理装置１０にはネットワークとの間で通信を行う通信インターフェース２４、フロッピィディスクや光磁気ディスク等の外部記憶媒体から情報を読み出すリムーバブルドライブ２８を有し、ネットワーク経由又は外部記憶媒体から図１について上述したクラス分類適用処理を行うための各プログラムをハードディスク装置２５のハードディスクに読み込んみ、当該読み込まれたプログラムに従ってクラス分類適応処理を行うこともできる。
【００３１】
ユーザは、キーボードやマウス等の入力手段２６を介して種々のコマンドを入力することにより、ＣＰＵ２１に対して図１について上述したクラス分類処理を実行させる。この場合、オーディオ信号処理装置１０はデータ入出力部２７を介して音質を向上させようとするオーディオデータ（入力オーディオデータ）Ｄ１０を入力し、当該入力オーディオデータＤ１０に対してクラス分類適用処理を施した後、音質が向上したオーディオデータＤ１６をデータ入出力部２７を介して外部に出力し得るようになされている。
【００３２】
因みに、図４はオーディオ信号処理装置１０におけるクラス分類適応処理の処理手順を示し、オーディオ信号処理装置１０はステップＳＰ１１から当該処理手順に入ると、続くステップＳＰ１２において入力オーディオデータＤ１０の極性を極性判別部１１において算出する。
【００３３】
この算出された極性はオーティオ波形データＤ１２のクラス分類を一段と確実にするためのもであり、オーディオ信号処理装置１０は、ステップＳＰ１３においてクラス分類部１４によりオーディオ波形データＤ１２及び極性クラスＤ１１に基づいてオーディオ波形データＤ１２をクラス分類する。そしてオーディオ信号処理装置１０は、クラス分類の結果得られたクラスコードを用いて予測係数メモリ１５から予測係数を読み出す。この予測係数は予め学習によりクラス毎に対応して格納されており、オーディオ信号処理装置１０はクラスコードに対応した予測係数を読み出すことにより、このときのオーディオ波形の特徴に合致した予測係数を用いることができる。
【００３４】
予測係数メモリ１５から読み出された予測係数は、ステップＳＰ１４において予測演算部１６の予測演算に用いられる。これにより、入力オーディオデータＤ１０はその極性に応じた予測演算により、所望とするオーディオデータＤ１６に変換される。かくして入力オーディオデータＤ１０はその音質が改善されたオーディオデータＤ１６に変換され、オーディオ信号処理装置１０はステップＳＰ１５に移って当該処理手順を終了する。
【００３５】
次に、図１について上述した予測係数メモリ１５に記憶するクラス毎の予測係数のセットを予め学習によって得るための学習回路について説明する。
【００３６】
図５において、学習回路３０は、高音質の教師オーディオデータＤ３０を生徒信号生成フィルタ３７に受ける。生徒信号生成フィルタ３７は、間引き率設定信号Ｄ３９により設定された間引き率で教師オーディオデータＤ３０を所定時間ごとに所定サンプル間引くようになされている。
【００３７】
この場合、生徒信号生成フィルタ３７における間引き率によって、生成される予測係数が異なり、これに応じて上述のオーディオ信号処理装置１０で再現されるオーディオデータも異なる。例えば、上述のオーディオ信号処理装置１０においてサンプリング周波数を高くすることでオーディオデータの音質を向上しようとする場合、生徒信号生成フィルタ３７ではサンプリング周波数を減らす間引き処理を行う。また、これに対して上述のオーディオ信号処理装置１０において入力オーディオデータＤ１０の欠落したデータサンプルを補うことで音質の向上を図る場合には、これに応じて、生徒信号生成フィルタ３７ではデータサンプルを欠落させる間引き処理を行うようになされている。
【００３８】
かくして、生徒信号生成フィルタ３７は教師オーディオデータ３０から所定の間引き処理により生徒オーディオデータＤ３７を生成し、これを極性判別部３１、クラス分類部抽出部３２及び予測演算部抽出部３３にそれぞれ供給する。
【００３９】
極性判別部３１は生徒信号生成フィルタ３７から供給された生徒オーディオデータＤ３７を所定時間毎の領域（この実施の形態の場合、例えば６サンプル毎とする）に分割した後、当該分割された各時間領域の波形について、その極性クラスを図２について上述したように分類する。
【００４０】
そして極性判別部３１は生徒オーディオデータＤ３７のこのとき分割された時間領域の極性判別結果を生徒オーディオデータＤ３７の極性クラスデータＤ３１としてクラス分類部３４に供給する。
【００４１】
また、クラス分類部抽出部３２は生徒信号生成フィルタ３７から供給された生徒オーディオデータＤ３７を、極性判別部３１の場合と同様の時間領域（この実施の形態の場合例えば６サンプル）に分割することによりクラス分類しようとするオーディオ波形データＤ３２を抽出し、これをクラス分類部３４に供給する。
【００４２】
クラス分類部３４は、クラス分類抽出部３２において切り出されたオーディオ波形データＤ３２について、当該オーディオ波形データＤ３２を圧縮して圧縮データパターンを生成するＡＤＲＣ(Adaptive Dynamic Range Coding) 回路部と、オーディオ波形データＤ３２の属するクラスコードを発生するクラスコード発生回路部とを有する。
【００４３】
ＡＤＲＣ回路部はオーディオ波形データＤ３２に対して、例えば８ビットから２ビットに圧縮するような演算を行うことによりパターン圧縮データを形成する。このＡＤＲＣ回路部は、適応的量子化を行うものであり、ここでは、信号レベルの局所的なパターンを短い語長で効率的に表現することができるので、信号パターンのクラス分類のコード発生用に用いられる。
【００４４】
具体的には、オーディオ波形上の６つの８ビットのデータ（オーディオ波形データ）をクラス分類しようとする場合、２⁴⁸という膨大な数のクラスに分類しなければならず、回路上の負担が多くなる。そこで、この実施の形態のクラス分類部１４ではその内部に設けられたＡＤＲＣ回路部で生成されるパターン圧縮データに基づいてクラス分類を行う。例えば６つのオーディオ波形データに対して１ビットの量子化を実行すると、６つのオーディオ波形データを６ビットで表すことができ、２⁶＝６４クラスに分類することができる。
【００４５】
ここで、ＡＤＲＣ回路部は、切り出された領域内のオーディオ波形のダイナミックレンジをＤＲ、ビット割り当をｍ、各オーディオ波形データのデータレベルをＬ、量子化コードをＱとして、上述の（１）式と同様の演算により、領域内の最大値ＭＡＸと最小値ＭＩＮとの間を指定されたビット長で均等に分割して量子化を行う。かくしてオーディオ波形上の６つの波形データが、それぞれ例えば８ビット（ｍ＝８）で構成されているとすると、これらはＡＤＲＣ回路部においてそれぞれが２ビットに圧縮される。
【００４６】
このようにしてオーディオ波形のダイナミックレンジで正規化し圧縮されたオーディオ波形データをそれぞれｑ_n（ｎ＝１〜６）とすると、クラス分類部３４に設けられたクラスコード発生回路部は、圧縮されたオーディオ波形データｑ_nに基づいて、上述の（２）式と同様の演算を実行することにより、そのブロック（ｑ₁〜ｑ₆）が属するクラスを示すクラスコードclass を算出し、当該算出されたクラスコードclass と極性判別部３１により算出された極性クラス（ CLASS０、 CLASS１、 CLASS２又は CLASS３）とを統合した後、当該統合されてなるクラスコード class′を表すクラスコードデータＤ３４を予測係数算出部３６に供給する。因みに（２）式において、ｎは圧縮されたオーディオ波形データｑ_nの数を表し、この実施の形態の場合ｎ＝６であり、またＰはビット割り当てを表し、この実施の形態の場合Ｐ＝２である。
【００４７】
このようにして、クラス分類部３４はクラスコードデータＤ３４を生成し、これを予測係数算出部３６に供給する。また、予測係数算出部３６には、クラスコードデータＤ３４に対応した時間軸領域のオーディオ波形データＤ３３（ｘ₁、ｘ₂、……、ｘ_n）が予測演算部抽出部３３において切り出されて供給される。
【００４８】
予測係数算出部３６は、クラス分類部３４から供給されたクラスコードclass ′と、各クラスコードclass 毎に切り出されたオーディオ波形データＤ３３と、入力端Ｔ_INから供給された高音質の教師オーディオデータＤ３０とを用いて、正規方程式を立てる。
【００４９】
すなわち、生徒オーディオデータＤ３７のｎサンプルのレベルをそれぞれｘ₁、ｘ₂、……、ｘ_nとして、それぞれにｐビットのＡＤＲＣを行った結果の量子化データをｑ₁、……、ｑ_nとする。このとき、この領域のクラスコードclass ′を上述の（２）式のように定義する。そして、上述のように生徒オーディオデータＤ３７のレベルをそれぞれ、ｘ₁、ｘ₂、……、ｘ_nとし、高音質の教師オーディオデータＤ３０のレベルをｙとしたとき、クラスコード毎に、予測係数ｗ₁、ｗ₂、……、ｗ_nによるｎタップの線形推定式を設定する。これを次式、
【００５０】
【数４】

【００５１】
とする。学習前は、ｗ_nが未定係数である。
【００５２】
学習回路３０では、クラスコード毎に、複数のオーディオデータに対して学習を行う。データサンプル数がＭの場合、上述の（４）式に従って、次式、
【００５３】
【数５】

【００５４】
が設定される。但しｋ＝１、２、……Ｍである。
【００５５】
Ｍ＞ｎの場合、予測係数ｗ₁、……ｗ_nは一意的に決まらないので、誤差ベクトルｅの要素を次式、
【００５６】
【数６】

【００５７】
によって定義し（但し、ｋ＝１、２、……、Ｍ）、次式、
【００５８】
【数７】

【００５９】
を最小にする予測係数を求める。いわゆる、最小自乗法による解法である。
【００６０】
ここで、（７）式によるｗ_nの偏微分係数を求める。この場合、次式、
【００６１】
【数８】

【００６２】
を「０」にするように、各ｗ_n（ｎ＝１〜６）を求めれば良い。
【００６３】
そして、次式、
【００６４】
【数９】

【００６５】
【数１０】

【００６６】
のように、Ｘ_ij、Ｙ_iを定義すると、（８）式は行列を用いて次式、
【００６７】
【数１１】

【００６８】
として表される。
【００６９】
この方程式は、一般に正規方程式と呼ばれている。なお、ここではｎ＝６である。
【００７０】
全ての学習用データ（教師オーディオデータＤ３０、クラスコードclass ′、オーディオ波形データＤ３３）の入力が完了した後、予測係数算出部３６は各クラスコードclass ′に上述の（１１）式に示した正規方程式を立てて、この正規方程式を掃き出し法等の一般的な行列解法を用いて、各Ｗ_nについて解き、各クラスコード毎に、予測係数を算出する。予測係数算出部３６は、算出された各予測係数（Ｄ３６）を予測係数メモリ１５に書き込む。
【００７１】
このような学習を行った結果、予測係数メモリ１５には、量子化データｑ₁、……、ｑ₆で規定されるパターン毎に、高音質のオーディオデータｙを推定するための予測係数が、各クラスコード毎に格納される。この予測係数メモリ１５は、図１について上述したオーディオ信号処理装置１０において用いられる。かかる処理により、線形推定式に従って通常のオーディオデータから高音質のオーディオデータを作成するための予測係数の学習が終了する。
【００７２】
このように、学習回路３０は、オーディオ信号処理装置１０において補間処理を行う程度を考慮して、生徒信号生成フィルタ３７で高音質の教師オーディオデータの間引き処理を行うことにより、オーディオ信号処理装置１０における補間処理のための予測係数を生成することができる。
【００７３】
以上の構成において、オーディオ信号処理装置１０は、クラス分類部１４のＡＤＲＣ処理においてオーディオ波形をそのダイナミックレンジで正規化することでオーディオ波形そのもののクラスコード classを得る。この場合、オーディオ波形のゼロレベル近傍及び大振幅部では音素が異なっている場合が多く、単にダイナミックレンジで正規化した結果でクラス分類を行うと、元々異なる音素であっても同一クラスと見なされてしまうことがある。従って、クラス分類部１４では、オーディオ波形そのもののクラスコード classに、オーディオ波形の極性クラス CLASSを統合してクラスコード class′を算出し、これをクラス分類結果として予測演算に用いることにより、オーディオ波形そのものから得られたクラスコード classが同一クラスとなった場合でも、オーディオ波形の極性クラスに応じて確実にクラス分類することができる。
【００７４】
例えば、極性クラスが CLASS０又は CLASS３である場合、このことは切り出されたオーディオ波形データの値が全て正又は負であること、すなわち比較的大振幅の波形部分であることを表しており、また、極性クラスが CLASS１又は CLASS２である場合、このことは切り出されたオーディオ波形がゼロクロス部と正又は負とを含む波形であること、すなわち比較的ゼロレベル近傍の波形部分であることを表しており、クラス分類部１４はかかる極性クラスをオーディオ波形データそのもののクラスコード classに統合してクラス分類を行うことにより、異なる音素を異なるクラスコードとして分類することができる。
【００７５】
以上の構成によれば、入力オーディオデータＤ１０の極性クラスを用いて入力オーディオデータＤ１０をクラス分類し、当該クラス分類された結果に基づく予測係数を用いて予測演算するようにしたことにより、入力オーディオデータＤ１０を一段と高音質のオーディオデータＤ１６に変換することができる。
【００７６】
なお上述の実施の形態においては、オーディオ信号処理装置１０及び学習装置３０において、クラス分類部抽出部１２、３２及び予測演算部抽出部１３、３３により入力オーディオデータＤ１０、Ｄ３７を常に一定の範囲毎に切り出す場合について述べたが、本発明はこれに限らず、例えば図１及び図５との対応部分に同一符号を付して示す図６及び図７に示すように、極性判別部１１、３１において算出された極性クラスに基づいて抽出制御信号ＣＯＮＴ１１、ＣＯＮＴ３１を可変クラス分類部抽出部１２′、可変予測演算部抽出部１３′及び可変クラス分類部抽出部３２′、可変予測演算部抽出部３３′に供給することにより入力オーディオデータＤ１０、Ｄ３７の切り出し範囲（タップ）を制御するようにしても良い。
【００７７】
この場合、極性判別部１１、３１は、極性クラス CLASS０、 CLASS１、 CLASS２及び CLASS３の頻度に基づいて切り出し範囲（タップの切り出し長）を制御することにより、タップの切り出し長を長くし過ぎることによる正極性のみ（ CLASS０）又は負極性のみ（ CLASS３）への分類頻度の低下を防止することができる。
【００７８】
この場合、オーディオデータの変換処理手順は図４との対応部分に同一符号を付して示す図８に示すように、オーディオ波形の極性を判別するステップＳＰ１２の次に、当該判別された極性に基づいて可変クラス分類部抽出部１２′、３２′及び可変予測演算部抽出部１３′、３３′におけるタップ抽出領域を制御する処理ステップＳＰ２１を挿入するようにすれば良い。
【００７９】
また上述の実施の形態においては、極性クラスとして４つの極性クラス CLASS０、 CLASS１、 CLASS２及び CLASS３を設ける場合について述べたが、本発明はこれに限らず、全て正の領域、全て負の領域、ゼロクロスを含む領域の３つの極性クラスに分類するようにしても良い。
【００８０】
また上述の実施の形態においては、予測方式として線形一次による手法を用いる場合について述べたが、本発明はこれに限らず、要は学習した結果を用いるようにすれば良く、例えば多次関数による手法等の種々の予測方式を適用することができる。
【００８１】
また上述の実施の形態においては、クラス分類部１４においてＡＤＲＣにより圧縮データパターンを生成する場合について述べたが、本発明はこれに限らず、可逆符号化（ＤＰＣＭ:Differrential Pulse Code Modulation) 又はベクトル量子化（ＶＱ:Vector Quantize) 等の圧縮手段を用いるようにしても良い。
【００８２】
また上述の実施の形態においては、学習回路３０の生徒信号生成フィルタ３７において教師オーディオデータＤ３０から所定サンプル数を間引く場合について述べたが、本発明はこれに限らず、例えばビット数を削減する等、他の種々の方法を適用することができる。
【００８３】
【発明の効果】
かかる課題を解決するため本発明においては、ゼロレベルを基準としてディジタルオーディオ信号の極性に基づいてディジタルオーディオ信号のクラスを分類し、当該分類されたクラスに対応した予測方式でディジタルオーディオ信号を変換するようにしたことにより、音素に応じてクラス分類することができるので、一段とディジタルオーディオ信号の特徴に適応した変換を行うことができる。
【図面の簡単な説明】
【図１】本発明によるディジタル信号処理装置の構成を示すブロック図である。
【図２】極性判別の説明に供する信号波形図である。
【図３】オーディオ信号処理装置の構成を示すブロック図である。
【図４】オーディオ信号変換処理手順を示すフローチャートである。
【図５】本発明による学習装置の構成を示すブロック図である。
【図６】ディジタル信号処理装置の他の実施の形態を示すブロック図である。
【図７】学習装置の他の実施の形態を示すブロック図である。
【図８】他の実施の形態によるオーディオ信号変換処理手順を示すフローチャートである。
【符号の説明】
１０……オーディオ信号処理装置、１１、３１……極性判別部、１４、３４……クラス分類部、１５……予測係数メモリ、１６……予測演算部、３６……予測係数算出部、３７……生徒信号生成フィルタ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a digital signal processing method, a learning method, an apparatus for the same, and a program storage medium, and relates to a digital signal processing method for performing data interpolation processing on a digital signal in a rate converter or a PCM (Pulse Code Modulation) decoding device, and learning The present invention is suitable for application to methods and their apparatuses and program storage media.
[0002]
[Prior art]
Conventionally, before a digital audio signal is input to a digital / analog converter, an oversampling process for converting the sampling frequency to several times the original value is performed. This allows the digital audio signal output from the digital / analog converter to maintain the phase characteristics of the analog anti-alias filter at a high audible frequency range and eliminates the effects of digital image noise associated with sampling. It is made to be done.
[0003]
In such oversampling processing, a digital filter of a linear primary (linear) interpolation method is usually used. Such a digital filter obtains an average value of a plurality of existing data and generates linear interpolation data when the sampling rate changes or data is lost.
[0004]
[Problems to be solved by the invention]
However, the digital audio signal after the oversampling process has a data amount that is several times denser in the time axis direction by linear linear interpolation, but the frequency band of the digital audio signal after the oversampling process is the same as that before the conversion. It has not changed much, and the sound quality itself has not improved. Furthermore, since the interpolated data is not necessarily generated based on the waveform of the analog audio signal before A / D conversion, the waveform reproducibility is hardly improved.
[0005]
In addition, when dubbing digital audio signals with different sampling frequencies, the frequency is converted using a sampling rate converter. Even in such a case, only linear data interpolation can be performed using a linear primary digital filter. Therefore, it was difficult to improve sound quality and waveform reproducibility. Further, the same applies when a data sample of the digital audio signal is lost.
[0006]
The present invention has been made in consideration of the above points, and an object of the present invention is to propose a digital signal processing method, a learning method, an apparatus thereof, and a program storage medium that can further improve the digital signal waveform reproducibility.
[0007]
[Means for Solving the Problems]
In order to solve such a problem, in the present invention, a class of a digital audio signal is classified based on the polarity of the digital audio signal with reference to a zero level, and the digital audio signal is converted by a prediction method corresponding to the classified class. By doing so, it is possible to classify according to phonemes, so that conversion adapted to the characteristics of the digital audio signal can be performed.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
[0009]
In FIG. 1, an audio signal processing apparatus 10 generates audio data close to a true value by class classification application processing when raising the sampling rate of a digital audio signal (hereinafter referred to as audio data) or interpolating audio data. It is made to do. Incidentally, the digital audio signal means an audio signal representing a voice uttered by a person or an animal, a musical sound signal representing a musical sound emitted by a musical instrument, and a signal representing another sound.
[0010]
That is, in the audio signal processing apparatus 10, the polarity determination unit 11 when the input audio data D10 shown in FIG. 2, which is supplied from the input terminal T _IN region (this embodiment the predetermined time intervals, eg, every 6 sample 2), the polarity class of the waveform of each divided time region is discriminated by the polarity discriminating method shown in FIG.
[0011]
That is, in FIG. 2, when all the taps cut out as in the divided area AR1 (current data C1) are positive, this polarity class is set to CLASS0, as in the divided area AR2 (current data C2). If a zero cross exists in the cut out area and the current data (C2) is positive, this polarity class is set to CLASS1, and the cut out area in the cut out area AR4 (current data C4) If the zero cross is present and the current data (C4) is negative, this polarity class is CLASS2, and if all the taps extracted as in the divided area AR3 (current data C3) are negative, this The polarity class is CLASS3.
[0012]
In this way, by setting the polarity class based on the polarity of the audio data D10, the phoneme when the audio data D10 is near the zero level and the phoneme in the large amplitude portion are more clearly distinguished in both positive and negative regions. be able to.
[0013]
The polarity discriminating unit 11 supplies the polarity discriminating result (CLASS0, CLASS1, CLASS2 or CLASS3) obtained corresponding to the current data of the input audio data D10 at this time to the class classification unit 14 as the polarity class data D11.
[0014]
Further, the class classification unit extraction unit 12 divides the input audio data D10 supplied from the input terminal T _IN into the same time domain as the polarity discrimination output unit 11 (for example, 6 samples in this embodiment). Thus, the audio waveform data D12 to be classified is extracted and supplied to the class classification unit 14.
[0015]
The class classification unit 14 includes an ADRC (Adaptive Dynamic Range Coding) circuit unit that compresses the audio waveform data D12 and generates a compressed data pattern for the audio waveform data D12 extracted by the class classification extraction unit 12, and audio waveform data. A class code generation circuit unit for generating a class code to which D12 belongs.
[0016]
The ADRC circuit unit performs pattern compression data on the audio waveform data D12 by performing an operation such as compression from 8 bits to 2 bits, for example. This ADRC circuit unit performs adaptive quantization. Here, since a local pattern of a signal level can be efficiently expressed with a short word length, it is used for generating a code for classifying a signal pattern. Used for.
[0017]
Specifically, when attempting to classify six classes 8 bits of data (audio waveform data) on the audio waveform, it must be classified into enormous number of classes of 2 ^48, many burden on the circuit Become. Therefore, the class classification unit 14 of this embodiment performs class classification based on the pattern compression data generated by the ADRC circuit unit provided therein. For example, if 1-bit quantization is performed on 6 audio waveform data, the 6 audio waveform data can be represented by 6 bits and can be classified into 2 ⁶ = 64 classes.
[0018]
Here, when the dynamic range of the audio waveform in the clipped region is DR, the bit allocation is m, the data level of each audio waveform data is L, and the quantization code is Q, the ADRC circuit unit has the following formula:
[0019]
[Expression 1]

[0020]
Accordingly, the quantization is performed by equally dividing the maximum value MAX and the minimum value MIN in the region with the designated bit length. In the expression (1), {} means a rounding process after the decimal point. Thus, if the six waveform data on the audio waveform are each composed of, for example, 8 bits (m = 8), these are each compressed to 2 bits in the ADRC circuit unit.
[0021]
Assuming that the audio waveform data normalized and compressed in the dynamic range in this way is q _n (n = 1 to 6), the class code generation circuit unit provided in the class classification unit 14 generates a compressed audio waveform. Based on the data q _n ,
[0022]
[Expression 2]

[0023]
Is executed to calculate the class code class indicating the class to which the block (q _{1 to} q ₆ ) belongs, and to the class code class based on the calculated audio waveform data D 12, After the polarity class CLASS is integrated, class code data D14 representing the integrated class code class ′ is supplied to the prediction coefficient memory 15. The class code class ′ indicates a read address when a prediction coefficient is read from the prediction coefficient memory 15. Incidentally, in equation (2), n represents the number of compressed audio waveform data q _n , n = 6 in this embodiment, and P represents bit allocation, and P = in this embodiment. 2.
[0024]
In this way, the class classification unit 14 integrates the class code class of the audio waveform data D12 itself extracted from the input audio data D10 by the class classification unit extraction unit 12 and the polarity class CLASS of the audio waveform data D12. Data (class ′) D 14 is generated and supplied to the prediction coefficient memory 15. Incidentally, as a method of integrating the class code class of the audio waveform data D12 itself and the polarity class CLASS of the audio waveform data D12, the class classification unit 14 adds the polarity class CLASS to the class code class of the audio waveform data D12 itself, for example. It is possible to integrate them.
[0025]
In the prediction coefficient memory 15, a set of prediction coefficients corresponding to each class code is stored at an address corresponding to the class code, and based on the class code data D 14 supplied from the class classification unit 14, the class code The set of prediction coefficients w _{1 to} w _n stored at the address corresponding to is read and supplied to the prediction calculation unit 16.
[0026]
The prediction calculation unit 16 includes audio waveform data (prediction tap) D13 (x _{1 to} x _n ) to be predicted and extracted from the input audio data D10 in the time axis region by the prediction calculation unit extraction unit 13, and a prediction coefficient w. against ₁ ~w _n, the following equation [0027]
[Equation 3]

[0028]
The prediction result y ′ is obtained by performing the product-sum operation shown in FIG. The predicted value y ′ is output from the prediction calculation unit 16 as audio data D16 with improved sound quality.
[0029]
Although the functional block described above with reference to FIG. 1 is shown as the configuration of the audio signal processing apparatus 10, the computer configuration apparatus shown in FIG. 3 is used in this embodiment as a specific configuration of the functional block. 3, the audio signal processing apparatus 10 is connected to a CPU 21, a ROM (Read Only Memory) 22, a RAM (Random Access Memory) 15 constituting a prediction coefficient memory 15, and each circuit unit via a bus BUS. The CPU 11 executes the various programs stored in the ROM 22 so that each of the functional blocks (polarity determination unit 11, class classification unit extraction unit 12, prediction calculation unit extraction unit) described above with reference to FIG. 13, class classification unit 14 and prediction calculation unit 16).
[0030]
The audio signal processing apparatus 10 also has a communication interface 24 that communicates with a network, and a removable drive 28 that reads information from an external storage medium such as a floppy disk or a magneto-optical disk, via a network or from an external storage medium. Each program for performing the class classification application process described above with reference to FIG. 1 may be read into the hard disk of the hard disk device 25, and the class classification adaptive process may be performed according to the read program.
[0031]
The user inputs various commands through the input means 26 such as a keyboard and a mouse, thereby causing the CPU 21 to execute the class classification process described above with reference to FIG. In this case, the audio signal processing apparatus 10 inputs audio data (input audio data) D10 for improving sound quality via the data input / output unit 27, and performs class classification application processing on the input audio data D10. After that, the audio data D16 with improved sound quality can be output to the outside via the data input / output unit 27.
[0032]
4 shows a processing procedure of the class classification adaptive processing in the audio signal processing device 10. When the audio signal processing device 10 enters the processing procedure from step SP11, the polarity of the input audio data D10 is discriminated in step SP12. Calculated in part 11.
[0033]
This calculated polarity is for further ensuring the classification of the audio waveform data D12, and the audio signal processing apparatus 10 uses the class classification unit 14 based on the audio waveform data D12 and the polarity class D11 in step SP13. The audio waveform data D12 is classified. Then, the audio signal processing device 10 reads the prediction coefficient from the prediction coefficient memory 15 using the class code obtained as a result of the classification. This prediction coefficient is stored in advance corresponding to each class by learning, and the audio signal processing apparatus 10 reads the prediction coefficient corresponding to the class code, and uses the prediction coefficient that matches the characteristics of the audio waveform at this time. be able to.
[0034]
The prediction coefficient read from the prediction coefficient memory 15 is used for the prediction calculation of the prediction calculation unit 16 in step SP14. As a result, the input audio data D10 is converted into desired audio data D16 by a prediction calculation according to the polarity. Thus, the input audio data D10 is converted into the audio data D16 with improved sound quality, and the audio signal processing apparatus 10 proceeds to step SP15 and ends the processing procedure.
[0035]
Next, a learning circuit for obtaining in advance a set of prediction coefficients for each class stored in the prediction coefficient memory 15 described above with reference to FIG. 1 will be described.
[0036]
In FIG. 5, the learning circuit 30 receives high-quality teacher audio data D <b> 30 by the student signal generation filter 37. The student signal generation filter 37 is configured to thin out the teacher audio data D30 by a predetermined number of samples every predetermined time at a thinning rate set by the thinning rate setting signal D39.
[0037]
In this case, the generated prediction coefficient differs depending on the decimation rate in the student signal generation filter 37, and the audio data reproduced by the audio signal processing apparatus 10 described above also differs accordingly. For example, when the audio signal processing apparatus 10 described above attempts to improve the sound quality of audio data by increasing the sampling frequency, the student signal generation filter 37 performs a thinning process to reduce the sampling frequency. On the other hand, when the audio signal processing apparatus 10 supplements the missing data sample of the input audio data D10 to improve the sound quality, the student signal generation filter 37 accordingly selects the data sample. The thinning-out process to be deleted is performed.
[0038]
Thus, the student signal generation filter 37 generates student audio data D37 from the teacher audio data 30 by a predetermined thinning process, and supplies the student audio data D37 to the polarity determination unit 31, the class classification unit extraction unit 32, and the prediction calculation unit extraction unit 33, respectively. .
[0039]
The polarity discriminating unit 31 divides the student audio data D37 supplied from the student signal generation filter 37 into regions for every predetermined time (in this embodiment, for example, every 6 samples), and then the divided times For the waveform of the region, its polarity class is classified as described above for FIG.
[0040]
Then, the polarity discriminating unit 31 supplies the class discrimination unit 34 with the polarity discriminating result of the time domain divided at this time of the student audio data D37 as the polarity class data D31 of the student audio data D37.
[0041]
Further, the class classification unit extraction unit 32 divides the student audio data D37 supplied from the student signal generation filter 37 into the same time region as that of the polarity determination unit 31 (for example, 6 samples in this embodiment). Thus, the audio waveform data D32 to be classified is extracted and supplied to the class classification unit 34.
[0042]
The class classification unit 34 includes an ADRC (Adaptive Dynamic Range Coding) circuit unit that compresses the audio waveform data D32 and generates a compressed data pattern for the audio waveform data D32 extracted by the class classification extraction unit 32, and audio waveform data. A class code generation circuit unit for generating a class code to which D32 belongs.
[0043]
The ADRC circuit unit performs pattern compression data on the audio waveform data D32 by performing an operation such as compression from 8 bits to 2 bits, for example. This ADRC circuit unit performs adaptive quantization. Here, since a local pattern of a signal level can be efficiently expressed with a short word length, it is used for generating a code for classifying a signal pattern. Used for.
[0044]
Specifically, when attempting to classify six classes 8 bits of data (audio waveform data) on the audio waveform, it must be classified into enormous number of classes of 2 ^48, many burden on the circuit Become. Therefore, the class classification unit 14 of this embodiment performs class classification based on the pattern compression data generated by the ADRC circuit unit provided therein. For example, if 1-bit quantization is performed on 6 audio waveform data, the 6 audio waveform data can be represented by 6 bits and can be classified into 2 ⁶ = 64 classes.
[0045]
Here, the ADRC circuit unit assumes that the dynamic range of the audio waveform in the extracted region is DR, bit allocation is m, the data level of each audio waveform data is L, and the quantization code is Q. By performing the same operation as in the equation, quantization is performed by equally dividing the maximum value MAX and the minimum value MIN in the area with a designated bit length. Thus, if the six waveform data on the audio waveform are each composed of, for example, 8 bits (m = 8), these are each compressed to 2 bits in the ADRC circuit unit.
[0046]
Assuming that the audio waveform data normalized and compressed in the dynamic range of the audio waveform in this way is q _n (n = 1 to 6), respectively, the class code generation circuit unit provided in the class classification unit 34 is compressed. Based on the audio waveform data q _n , the class code class indicating the class to which the block (q _{1 to} q ₆ ) belongs is calculated by performing the same operation as the above equation (2). After the class code class and the polarity class (CLASS0, CLASS1, CLASS2, or CLASS3) calculated by the polarity discriminating unit 31 are integrated, class code data D34 representing the integrated class code class ′ is used as the prediction coefficient calculating unit 36. To supply. Incidentally, in equation (2), n represents the number of compressed audio waveform data q _n , n = 6 in this embodiment, and P represents bit allocation, and P = in this embodiment. 2.
[0047]
In this way, the class classification unit 34 generates the class code data D34 and supplies it to the prediction coefficient calculation unit 36. Also, the time-domain audio waveform data D33 (x ₁ , x ₂ ,..., X _n ) corresponding to the class code data D34 is cut out and supplied to the prediction coefficient calculation unit 36 by the prediction calculation unit extraction unit 33. Is done.
[0048]
The prediction coefficient calculation unit 36 includes the class code class ′ supplied from the class classification unit 34, the audio waveform data D33 cut out for each class code class, and high-quality teacher audio data supplied from the input terminal T _IN. A normal equation is established using D30.
[0049]
That, x ₁ the level of n samples of the student audio data D37, respectively, x _2, ..., as x _n, q ₁ the quantized data of the result of the ADRC of p bits each, ..., and q _n To do. At this time, the class code class' of this area is defined as in the above-described equation (2). Then, as described above, when the level of the student audio data D37 is x ₁ , x ₂ ,..., X _n and the level of the high-quality teacher audio data D30 is y, the prediction coefficient for each class code. An n-tap linear estimation formula using w ₁ , w ₂ ,..., w _n is set. This is expressed as
[0050]
[Expression 4]

[0051]
And Before learning, w _n is an undetermined coefficient.
[0052]
The learning circuit 30 learns a plurality of audio data for each class code. When the number of data samples is M, according to the above equation (4), the following equation:
[0053]
[Equation 5]

[0054]
Is set. However, k = 1, 2,...
[0055]
In the case of M> n, the prediction coefficients w ₁ ,... W _n are not uniquely determined.
[0056]
[Formula 6]

[0057]
(Where k = 1, 2,..., M),
[0058]
[Expression 7]

[0059]
Find the prediction coefficient that minimizes. This is a so-called least square method.
[0060]
Here, obtaining a partial differential coefficient of w _n by equation (7). In this case,
[0061]
[Equation 8]

[0062]
Each w _n (n = 1 to 6) may be obtained so as to set “0” to “0”.
[0063]
And the following formula:
[0064]
[Equation 9]

[0065]
[Expression 10]

[0066]
If X _ij and Y _i are defined as follows, Equation (8) is expressed as follows using a matrix:
[0067]
## EQU11 ##

[0068]
Represented as:
[0069]
This equation is generally called a normal equation. Here, n = 6.
[0070]
After the input of all the learning data (teacher audio data D30, class code class ', audio waveform data D33) is completed, the prediction coefficient calculation unit 36 assigns each class code class' to the normality shown in the above equation (11). An equation is established, and this normal equation is solved for each W _n by using a general matrix solution method such as a sweep method, and a prediction coefficient is calculated for each class code. The prediction coefficient calculation unit 36 writes each calculated prediction coefficient (D36) in the prediction coefficient memory 15.
[0071]
As a result of such learning, the prediction coefficient memory 15 has prediction coefficients for estimating high-quality audio data y for each pattern defined by the quantized data q ₁ ,..., Q ₆ . Stored for each class code. The prediction coefficient memory 15 is used in the audio signal processing apparatus 10 described above with reference to FIG. With this process, the learning of the prediction coefficient for creating high-quality audio data from normal audio data according to the linear estimation formula is completed.
[0072]
In this way, the learning circuit 30 considers the degree to which the audio signal processing apparatus 10 performs the interpolation process, and performs the thinning process of the high-quality teacher audio data with the student signal generation filter 37, thereby the audio signal processing apparatus 10. Predictive coefficients for the interpolation process can be generated.
[0073]
In the above configuration, the audio signal processing apparatus 10 obtains the class code class of the audio waveform itself by normalizing the audio waveform with the dynamic range in the ADRC processing of the class classification unit 14. In this case, the phonemes are often different near the zero level and large amplitude part of the audio waveform, and if classification is performed based on the result of simply normalizing with the dynamic range, even the original phonemes are regarded as the same class. May end up. Accordingly, the class classification unit 14 calculates the class code class ′ by integrating the polarity class CLASS of the audio waveform into the class code class of the audio waveform itself, and uses this as a class classification result for the prediction calculation, thereby obtaining the audio waveform. Even when the class code obtained from the same class is the same class, it can be classified reliably according to the polarity class of the audio waveform.
[0074]
For example, when the polarity class is CLASS0 or CLASS3, this means that all the values of the clipped audio waveform data are positive or negative, that is, a waveform portion having a relatively large amplitude, and When the polarity class is CLASS1 or CLASS2, this indicates that the cut-out audio waveform is a waveform including a zero-cross portion and a positive or negative value, that is, a waveform portion relatively near the zero level. The class classification unit 14 can classify different phonemes as different class codes by integrating the polarity class into the class code class of the audio waveform data itself and performing class classification.
[0075]
According to the above configuration, the input audio data D10 is classified using the polarity class of the input audio data D10, and the prediction calculation is performed using the prediction coefficient based on the classification result. The data D10 can be converted into audio data D16 with higher sound quality.
[0076]
In the above-described embodiment, in the audio signal processing device 10 and the learning device 30, the input audio data D10 and D37 are always set to a certain range by the class classification

unit extraction units

12 and 32 and the prediction calculation

unit extraction units

13 and 33. However, the present invention is not limited to this. For example, as shown in FIG. 6 and FIG. 7 in which the same reference numerals are assigned to corresponding parts to FIG. 1 and FIG. Extraction control signals CONT11 and CONT31 based on the polarity classes calculated in step S12, variable class classification unit extraction unit 12 ', variable prediction calculation unit extraction unit 13' and variable class classification unit extraction unit 32 ', variable prediction calculation unit extraction unit 33 The cut-out range (tap) of the input audio data D10 and D37 may be controlled by supplying to.
[0077]
In this case, the

polarity discriminating units

11 and 31 control the cutout range (tap cutout length) based on the frequency of the polarity classes CLASS0, CLASS1, CLASS2 and CLASS3, thereby positively increasing the tap cutout length. The fall of the classification frequency to only sex (CLASS0) or only negative polarity (CLASS3) can be prevented.
[0078]
In this case, the audio data conversion processing procedure is the same as the determined polarity after step SP12 for determining the polarity of the audio waveform, as shown in FIG. Based on this, processing step SP21 for controlling the tap extraction region in the variable class classification unit extraction units 12 ′ and 32 ′ and the variable prediction calculation unit extraction units 13 ′ and 33 ′ may be inserted.
[0079]
In the above-described embodiment, the case where four polarity classes CLASS0, CLASS1, CLASS2, and CLASS3 are provided as polarity classes has been described. However, the present invention is not limited to this, and all positive regions, all negative regions, and zero crossing are provided. You may make it classify | categorize into the three polarity classes of the area | region containing.
[0080]
Further, in the above-described embodiment, the case where the linear linear method is used as the prediction method has been described. However, the present invention is not limited to this, and in short, the learned result may be used. Various prediction methods such as a technique can be applied.
[0081]
In the above-described embodiment, the case where the class classification unit 14 generates a compressed data pattern by ADRC has been described. However, the present invention is not limited to this, and lossless coding (DPCM: Differential Pulse Code Modulation) or vector quantum is used. Compression means such as vectorization (VQ: Vector Quantize) may be used.
[0082]
In the above-described embodiment, the case where the student signal generation filter 37 of the learning circuit 30 thins out a predetermined number of samples from the teacher audio data D30 has been described. However, the present invention is not limited to this, and for example, the number of bits is reduced. Various other methods can be applied.
[0083]
【The invention's effect】
In order to solve such a problem, in the present invention, a class of a digital audio signal is classified based on the polarity of the digital audio signal with reference to a zero level, and the digital audio signal is converted by a prediction method corresponding to the classified class. By doing so, it is possible to classify according to phonemes, so that conversion adapted to the characteristics of the digital audio signal can be performed.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a digital signal processing apparatus according to the present invention.
FIG. 2 is a signal waveform diagram for explaining polarity discrimination.
FIG. 3 is a block diagram showing a configuration of an audio signal processing apparatus.
FIG. 4 is a flowchart showing an audio signal conversion processing procedure.
FIG. 5 is a block diagram showing a configuration of a learning device according to the present invention.
FIG. 6 is a block diagram showing another embodiment of the digital signal processing apparatus.
FIG. 7 is a block diagram showing another embodiment of the learning device.
FIG. 8 is a flowchart showing an audio signal conversion processing procedure according to another embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Audio signal processor, 11, 31 ... Polarity discrimination part, 14, 34 ... Class classification part, 15 ... Prediction coefficient memory, 16 ... Prediction calculation part, 36 ... Prediction coefficient calculation part, 37 ... ... Student signal generation filter.

Claims

In a digital signal processing apparatus for converting a digital audio signal,
Polarity discriminating means for discriminating the polarity of the digital audio signal on the basis of zero level ;
Class classification means for classifying the class of the digital audio signal based on the polarity discrimination result;
Predictive calculation means for generating a new digital audio signal obtained by converting the digital audio signal by predictive calculation of the digital audio signal by a prediction method corresponding to the classified class. Digital signal processing device.

The digital signal processing apparatus according to claim 1, wherein the polarity discriminating unit divides the digital audio signal into time axis regions and discriminates the polarity for each divided region.

2. The digital signal processing apparatus according to claim 1, wherein the polarity discriminating unit classifies the digital audio signal into at least three regions, that is, a positive region only, a negative region only, and a region including a zero cross.

The digital signal processing apparatus according to claim 1, wherein the prediction calculation means uses a prediction coefficient generated by learning based on a desired digital audio signal in advance.

In a digital signal processing method for converting a digital audio signal,
A polarity determination step for determining the polarity of the digital audio signal on the basis of zero level ;
A class classification step for classifying the class of the digital audio signal based on the polarity discrimination result;
A prediction calculation step of generating a new digital audio signal obtained by converting the digital audio signal by performing a prediction calculation of the digital audio signal by a prediction method corresponding to the classified class. Digital signal processing method.

6. The digital signal processing method according to claim 5, wherein in the polarity determination step, the digital audio signal is divided into time axis regions, and the polarity is determined for each divided region.

6. The digital signal processing method according to claim 5, wherein in the polarity determination step, the digital audio signal is classified into at least three regions, that is, a positive region only, a negative region only, and a region including a zero cross. .

The digital signal processing method according to claim 5, wherein a prediction coefficient generated by learning based on a desired digital audio signal is used in the prediction calculation step.

In the learning apparatus for generating prediction coefficients used for prediction calculation of conversion processing of a digital signal processing apparatus for converting a digital audio signal,
Student digital audio signal generating means for generating a student digital audio signal obtained by degrading the digital audio signal from a desired digital audio signal;
Polarity discriminating means for discriminating the polarity of the student digital audio signal on the basis of zero level ;
Class classification means for classifying the class of the student digital audio signal based on the determined polarity;
A learning apparatus comprising: prediction coefficient calculation means for calculating a prediction coefficient corresponding to the class based on the digital audio signal and the student digital audio signal.

The learning apparatus according to claim 9, wherein the polarity determination unit divides the digital audio signal into time axis regions and determines the polarity for each divided region.

The learning apparatus according to claim 9, wherein the polarity discriminating means classifies the digital audio signal into at least three regions, that is, a positive region only, a negative region only, and a region including a zero cross.

In the learning method for generating prediction coefficients used in prediction calculation of conversion processing of a digital signal processing apparatus for converting a digital audio signal,
A student digital audio signal generating step for generating a student digital audio signal obtained by degrading the digital audio signal from a desired digital audio signal;
A polarity determination step for determining the polarity of the student digital audio signal with reference to zero level ;
A classifying step of classifying the class of the student digital audio signal based on the determined polarity;
A learning method comprising: a prediction coefficient calculating step of calculating a prediction coefficient corresponding to the class based on the digital audio signal and the student digital audio signal.

13. The learning method according to claim 12, wherein in the polarity determination step, the digital audio signal is divided into time axis regions, and the polarity is determined for each divided region.

13. The learning method according to claim 12, wherein in the polarity discrimination step, the digital audio signal is classified into at least three regions, that is, only a positive region, only a negative region, and a region including a zero cross.

A polarity determination step for determining the polarity of the digital audio signal based on the zero level ;
A class classification step for classifying the class of the digital audio signal based on the polarity discrimination result;
A prediction calculation step for generating a new digital audio signal obtained by converting the digital audio signal by performing a prediction calculation on the digital audio signal using a prediction coefficient corresponding to the classified class. A computer-readable storage medium for recording a program.

A student digital audio signal generating step for generating a student digital audio signal obtained by degrading the digital audio signal from a desired digital audio signal;
A polarity determination step for determining the polarity of the student digital audio signal with reference to zero level ;
A classifying step of classifying the class of the student digital audio signal based on the determined polarity;
A computer-readable program storage medium storing a program for executing a prediction coefficient calculation step of calculating a prediction coefficient corresponding to the class based on the digital audio signal and the student digital audio signal.