JP3725876B2

JP3725876B2 - Audio encoder and its encoding processing program

Info

Publication number: JP3725876B2
Application number: JP2003033915A
Authority: JP
Inventors: 裕二奥田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-03-27
Filing date: 2003-02-12
Publication date: 2005-12-14
Anticipated expiration: 2023-02-12
Also published as: JP2004004554A

Description

【０００１】
【発明の属する技術分野】
この発明は、例えばＭＰＥＧ（Moving Picture Coding Experts Group）オーディオレコーダに設けられるオーディオ符号化器と、この符号化器においてオーディオ信号を符号化するために使用される符号化処理プログラムに関する。
【０００２】
【従来の技術】
ＭＰＥＧオーディオレコーダでは、心理聴覚分析を採用したオーディオ符号化器が用いられている。この種の符号化器は、入力されたオーディオ信号を、先ず時間／周波数変換部により時間領域の信号から周波数領域の信号に変換する。またそれと共に心理聴覚分析部により、高速フーリエ変換（ＦＦＴ）分析等を用いてスケールファクタバンドごとの信号対マスク比（ＳＭＲ）及び心理聴覚エントロピーをそれぞれ算出する。次に、量子化／可変長符号化／バッファ制御の繰り返しループにより、スケールファクタバンドごとのＳＭＲから心理聴覚に基づいた量子化誤差のマスキングレベルを算出する。そして、利用可能なビット数の範囲内で量子化処理及び可変長符号化処理を繰り返し実行し、これにより量子化誤差がマスキングレベル以下になるような最適なスケールファクタを求める。
【０００３】
また上記繰り返しループでは、符号化方式として可変長符号化方式を用いているため、各フレームで発生する符号化量は変化する。このため、ビット保存（bit reservoir）技術を用いている。すなわち、符号化発生量が少ないフレームでは、余ったビットを最大保存量を超えない範囲でビット保存に蓄積する。そして、この蓄積したビットを、オーディオ信号の立ち上がり等のように多くの符号化用のビットを必要とするフレームで利用する。多くのビットを必要とするか否かは、心理聴覚エントロピーに基づいて判定する。
【０００４】
上記繰り返しループにより得られたスケールファクタは、ビットストリーム形成部に入力される。ビットストリーム形成部では、上記繰り返しループにより得られた最適なスケールファクタの符号と、そのときの量子化値の可変長符号と、サイド情報とをもとに、所定のフォーマットのオーディオビットストリームを形成する。そして、この形成されたオーディオビットストリームを送信又は記録メディアへの蓄積に供する（例えば、非特許文献１参照。）。
【０００５】
ところで、上記量子化／可変長符号化／バッファ制御の繰り返しループでは、次のような処理が行われる。すなわち、先ず先に述べたスケールファクタバンドごとのＳＭＲから、心理聴覚に基づいた量子化誤差のマスキングレベルがスケールファクタバンドごとに計算される。続いて、ビット保存に蓄積されているビット数と、心理聴覚エントロピーとを考慮して、付加するビット数（add_bits）が求められる。そして、この算出された付加ビット数（add_bits）は、設定されたビットレートに基づく１フレームあたりの平均ビット数（mean_bits）に加算され、これにより利用可能な最大ビット数（max_bits）が決定される。
【０００６】
そうして、利用可能な最大ビット数（max_bits）が決定されると、スケールファクタ等を初期設定するための処理が行われ、続いて実際の周波数サンプルが現在のスケールファクタに基づき量子化される。その量子化値は可変長符号化され、これにより必要なビット数が求められる。もし、必要なビット数が利用可能な最大ビット数を超えている場合には、必要なビット数が利用可能なビット数以下になるまで量子化ステップサイズが大きな値に変更され、これにより必要なビット数が抑制される。
【０００７】
続いて、以上のように求められた量子化値と元の周波数サンプルとが比較され、スケールファクタバンドごとの量子化歪みが求められる。そして、量子化歪みがマスキングレベルを超えているスケールファクタバンド数（over）を求める。その結果、すべてのスケールファクタバンドにおいてその量子化歪みがマスキングレベル以下であれば（over=0）、繰り返しループから抜け出してそのときのスケールファクタが保存され、符号化に使用されたビット数が計算される。そして、この計算された使用ビット数と利用可能な最大ビット数との間に差（未使用ビット数）があれば、この未使用ビットが次フレーム以降のフレームの符号化処理に使用するためにビット保存に蓄積される。
【０００８】
一方、あるスケールファクタバンドにおいてその量子化歪みがマスキングレベルを超えていれば（over!=0）、それらすべてのスケールファクタが上限を超えずに修正可能であるか否かが判定される。そして、修正可能であればそれらのスケールファクタが増加される。以後、以上のループが繰り返し実行される。
【０００９】
設定されたビットレートが高く、利用可能な最大ビット数が十分にあるときには、量子化歪みが許容範囲以下のスケールファクタの組み合わせが見つかる。利用可能なビット数が少ないときにはループが繰り返し実行され、スケールファクタが上限値に達してしまう。この場合には、スケールファクタが修正不可能と判断され、準最適なスケールファクタの組み合わせが保存される。そして、符号化に使用されたビット数が計算され、この使用されたビット数と利用可能な最大ビット数との間に差（未使用ビット数）があれば、この未使用ビットが次フレーム以降のフレームの符号化処理に使用するためにビット保存に蓄積される。
【００１０】
入力されたオーディオ信号がステレオ信号の場合には、左右の各チャネルの信号ごとに、上述した量子化／可変長符号化／バッファ制御の繰り返しループの過程が繰り返し実行される。
【００１１】
【非特許文献１】
総合マルチメディア選書「ＭＰＥＧ」映像情報メディア学会編
第６章ＭＰＥＧオーディオ符号化（特にP141〜P153）
【００１２】
【発明が解決しようとする課題】
ところが、このような従来のオーディオ符号化器では、入力されたオーディオ信号のフレームごとに、ビット保存に蓄積されたビット数から利用可能な最大ビット数が順次計算され、この計算されたビット数が当該フレームの符号化処理のために割り当てられる。この処理は、入力されたオーディオ信号がステレオ信号の場合にも、一方のチャネルについての処理結果を他方のチャネルの処理に引き継ぐかたちで行われる。
【００１３】
このため、設定されたビットレートが低く、利用可能な最大ビット数が十分でない場合には、ステレオ信号の左右各チャネルのうち先に符号化処理されるチャネルの方に多くのビット数が割り当てられてしまい、後に符号化されるチャネルの符号化品質が劣化し易くなるという不具合を生じる。
【００１４】
この発明は上記事情に着目してなされたもので、その目的とするところは、ステレオ信号のように複数チャネルを有するオーディオ信号を符号化する場合に、設定されたビットレートが低い場合でも、複数のチャネルの各々についてそれぞれ適当なビット数を割り当てることを可能にし、これによりチャネル間における符号化品質のばらつきを低減したオーディオ符号化器とその符号化処理プログラムを提供することにある。
【００１５】
【課題を解決するための手段】
上記目的を達成するためにこの発明は、複数チャネルからなるオーディオ信号を、可変長符号化方式とビット保存技術を使用して符号化するオーディオ符号化器とその符号化プログラムにおいて、入力された上記オーディオ信号の各チャネル間における情報量の相違を検出し、その検出結果に基づいて、上記オーディオ信号の各フレームに割り当てる利用可能ビット数を補正する。そして、上記各チャネルのオーディオ信号をそれぞれスケールファクタに基づいて量子化及び可変長符号化する処理を、上記補正された利用可能ビット数の範囲内で、量子化歪みがマスキングレベル以下になるまで繰り返し実行し、この量子化処理及び可変長符号化処理により得られた結果をオーディオビットストリームにフォーマット化するようにしたものである。
【００１６】
具体的には、オーディオ信号のチャネル間におけるパワー比を検出し、この検出されたパワー比に基づいて、上記ビット割当てにより決定される利用可能ビット数を補正する。
【００１７】
また別の手段としては、心理聴覚分析によりオーディオ信号から求められる心理聴覚エントロピーのチャネル間の比を検出し、この検出された心理聴覚エントロピーの比に基づいて、上記ビット割当てにより決定される利用可能ビット数を補正する。
【００１８】
したがってこの発明によれば、量子化／可変長符号化／バッファ制御の繰り返しループの処理に先立ち、オーディオ信号のチャネル間における情報量の相違が検出され、その検出結果に基づいてビット割当てにより決定される利用可能ビット数が補正される。このため、設定されたビットレートが低く、利用可能な最大ビット数が十分でない場合であっても、複数のチャネルにそれぞれ適切なビット数が割り当てられる。したがって、先に符号化されるチャネルは勿論のこと、後に符号化されるチャネルについても、品質の劣化を招くことなく符号化を行うことが可能となり、これによりチャネル間の符号化品質のばらつきを低減することができる。
【００１９】
またこの発明は、上記繰り返しループにおいて、スケールファクタを繰り返し修正して量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせを求める際に、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせが見つからなかった場合には、準最適なスケールファクタの組み合わせを求めることを特徴とする。
【００２０】
準最適なスケールファクタの組み合わせを求める手段としては、次の各手段が考えられる。
第１の手段は、量子化歪みがマスキングレベルを超えるスケールファクタバンド数が最小となるときのスケールファクタを保持しておく。そして、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせが見つからなかった場合に、上記保持されているスケールファクタをもとに準最適なスケールファクタの組み合わせを求めるものである。
【００２１】
第２の手段は、量子化歪みがマスキングレベルを超えるスケールファクタバンドのバンド幅に応じて重み付けされた値の合計、つまり量子化歪みがマスキングレベルを超えるスケールファクタバンドのバンド幅の合計が最小となるときのスケールファクタを保持しておく。そして、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせが見つからなかった場合に、上記保持されているスケールファクタをもとに準最適なスケールファクタの組み合わせを求めるものである。
第３の手段は、量子化歪みがマスキングレベルを超えるスケールファクタバンドの、量子化歪みとマスキングレベルとの差の合計、つまり量子化歪みがマスキングレベルを超えるスケールファクタバンドの量子化歪みの誤差の合計が最小のときのスケールファクタを保持しておく。そして、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせが見つからなかった場合には、前記保持されているスケールファクタをもとに準最適なスケールファクタの組み合わせを求めるものである。
【００２２】
以上のように、準最適なスケールファクタを用意することによって、設定ビットレートが低く、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタバンドの組み合わせが見つからなかった場合でも、符号化されたオーディオ信号の品質劣化を抑制することが可能となる。
【００２３】
【発明の実施の形態】
（第１の実施形態）
図１は、この発明に係わるオーディオ符号化器の第１の実施形態を示す機能ブロック図である。このオーディオ符号化器は、ハイブリッドフィルタバンク１と、心理聴覚分析部２と、繰り返しループ３と、ビットストリーム形成部４とを備えている。
【００２４】
先ず心理聴覚分析部２は、高速フーリエ変換（ＦＦＴ：First Fourier Transform）部２１と、非予測可能性測定部２２と、信号対マスク比（ＳＭＲ：Signal-to-Mask Ratio）計算部２３と、心理聴覚エントロピー評価部２４とを有している。
【００２５】
心理聴覚分析部２には、例えば図示しないＰＣＭ（Pulse Code Modulation）符号化部により１６ビット直線量子化されたＰＣＭステレオオーディオ信号ＡＳが入力される。入力されたＰＣＭステレオオーディオ信号ＡＳは、高速フーリエ変換部２１でＦＦＴ分析されたのち、非予測可能性測定部２２により予測不可能性（Unpredictability）が測定される。ＳＭＲ計算部２３は、上記ＦＦＴ分析に基づいて、入力されたＰＣＭステレオオーディオ信号ＡＳのサブバンド（スケールファクタバンド）ごとのＳＭＲを計算する。心理聴覚エントロピー評価部２４は、上記計算されたＳＭＲをもとに心理聴覚エントロピーを求める。
【００２６】
なお、心理聴覚分析のモデルについては、杉山：“音響信号の高能率符号化”，連載講座「ディジタルテレビ放送の基礎技術」，テレビ誌，48，4，pp.447-454（Apr.1994）に詳しく述べられている。
【００２７】
ハイブリッドフィルタバンク１は、サブバンド分析フィルタバンク１１と、適応ブロック長の変形離散コサイン変換（ＭＤＣＴ：Modified Discrete Cosine Transform）部１２と、折り返し歪み削減バタフライ１３とを有している。
【００２８】
入力された上記ＰＣＭステレオオーディオ信号ＡＳは、サブバンド分析フィルタバンク１１により時間領域の信号から周波数領域の信号に変換されたのち、例えば３２帯域のサブバンド（スケールファクタバンド）の信号に分割される。サブバンド分析には、例えば５１２タップのポリフェーズフィルタバンク（ＰＦＢ：Polyphase Filter Bank）が用いられる。
【００２９】
適応ブロック長ＭＤＣＴ部１２は、プリエコーを抑圧するためのもので、上記分割された各サブバンド信号をさらに細かいスペクトルラインに写像する。このとき、適応ブロック長ＭＤＣＴ部１２のブロック長は、上記心理聴覚エントロピー評価部２４により求められる、予測不可能性を用いた心理聴覚エントロピーに基づいて決定される。折り返し歪み削減バタフライ１３は、上記適応ブロック長ＭＤＣＴ部１２により得られた写像信号に含まれる、周波数領域の折り返し歪みを除去する。
【００３０】
繰り返しループ３は、非線形量子化部３１と、スケールファクタ計算部３２と、バッファ制御部３３と、ハフマン符号化部３４と、サイド情報符号化部３５とを有している。
【００３１】
スケールファクタ計算部３２は、上記ＳＭＲ計算部２３により求められたスケールファクタバンドごとの信号対マスク比（ＳＭＲ）をもとに、心理聴覚モデルに基づいた量子化誤差のマスキングレベルをスケールファクタバンドごとに計算する。また、バッファ制御部３３のビット保存に蓄えられているビット数と、上記心理聴覚エントロピー評価部２４により求められた心理聴覚エントロピーとに基づいて、非線形量子化に利用可能な最大ビット数を決定する。さらにスケールファクタ計算部３２は、この利用可能な最大ビット数の決定に先立ち、前記入力されるオーディオ信号の左右各チャネル間のパワー比を算出する。算出されたパワー比は、上記利用可能な最大ビット数を補正するために用いられる。この補正された利用可能ビット数は、非線形量子化処理のために非線形量子化部３１に与えられる。
【００３２】
非線形量子化部３１は、上記スケールファクタ計算部３２によるビット割当てに従い、上記折り返し歪み削減バタフライ１３から出力された写像信号の非線形量子化を行う。この非線形量子化により得られた量子化値はハフマン符号化部３４により可変長符号化され、必要なビット数が求められる。この求められた必要ビット数は利用可能な最大ビット数と比較され、その比較結果をもとに量子化ステップサイズの変更が行われる。
【００３３】
以上の非線形量子化及び可変長符号化は繰り返しループを伴っており、すべてのスケールファクタバンドで量子化誤差がマスキングレベル以下になるまで繰り返される。そして、すべてのスケールファクタバンドで量子化誤差がマスキングレベル以下になると、このとき得られたスケールファクタが用いられる。またこの繰り返しループでは、符号化に使用されたビット数が計算され、未使用ビット数は次フレーム以降の符号化のためにバッファ制御部３３のビット保存に蓄積される。
【００３４】
サイド情報符号化部３５は、上記繰り返しループの終了後に、スケールファクタ計算部３２により算出されたスケールファクタをビット割当て情報やハフマンテーブルと共にサイド情報として符号化する。
【００３５】
ビットストリーム生成部４は、ヘッダと、上記ハフマン符号化部３４により符号化されたデータと、上記サイド情報符号化部３５により符号化されたサイド情報とを、所定のフォーマットに従い多重化してビットストリームを形成する。そして、この生成されたビットストリームを蓄積メディアへの蓄積に供するか、或いは通信路への送信に供する。
【００３６】
次に、以上のように構成されたオーディオ符号化器におけるステレオ信号の量子化／可変長符号化／バッファ制御の繰り返しループ３による処理動作を説明する。この処理動作は、プログラムをマイクロコンピュータ或いはＤＳＰ（Digital Signal Processor）に実行させることにより実現される。
【００３７】
図２は、その全体の処理手順と処理内容（メインルーチン）を示すフローチャートである。繰り返しループ３では、先ずステップ２ａにより、入力されるステレオオーディオ信号ＡＳの左右各チャネル間のパワー比が算出される。このパワー比の算出は次式により行われる。
【数１】

【００３８】
繰り返しループ３では次に、入力されたステレオオーディオ信号ＡＳの左右両チャネルについて順次、量子化／可変長符号化／バッファ制御の繰り返しループ処理が行われる。
【００３９】
すなわち、ステップ２ｂで先ずチャネル番号ｃｈが初期化（ｃｈ＝０）される。そして、この初期化したチャネル番号ｃｈ＝０に対応するチャネル、例えば左チャネルの入力オーディオ信号について、ステップ２ｄにより量子化／可変長符号化／バッファ制御の繰り返しループ処理が実行される。
【００４０】
この左チャネルに対する繰り返しループ処理が終了すると、ステップ２ｅでチャネル番号ｃｈがインクリメント（ｃｈ＋＋）され、続いてこのインクリメントされたチャネル番号ｃｈ＝１に対応するチャネル、例えば右チャネルについて、ステップ２ｄにより量子化／可変長符号化／バッファ制御の繰り返しループ処理が実行される。そうして左右両チャネルに対する繰り返しループ処理が終了すると、ステップ２ｃでこの処理の終了が確認されて繰り返しループ処理は終了となる。
【００４１】
ところで、上記左右各チャネルに対する量子化／可変長符号化／バッファ制御の繰り返しループ処理は、次のように行われる。図３は、そのサブルーチンの処理手順と処理内容を示すフローチャートである。
【００４２】
すなわち、先ずステップ３ａでは、ＳＭＲ計算部３により計算されたスケールファクタバンドごとのＳＭＲから、心理聴覚に基づいた量子化誤差のマスキングレベルがスケールファクタバンドごとに計算される。続いてステップ３ｂでは、バッファ制御部３３のビット保存に蓄積されているビット数と、心理聴覚エントロピーとを考慮して、付加するビット数（add_bits）が求められる。この求められた付加ビット（add_bits）は、先にステップ２ａで算出された左右各チャネル間のパワー比（x_ratio[ch] ）をもとに、次のように補正される。
add_bits＝add_bits＊x_ratio[ch] (4)
【００４３】
そして、この補正された付加ビット数（add_bits）は、設定されたビットレートに基づく１フレームあたりの平均ビット数（mean_bits）に加算され、これにより利用可能な最大ビット数（max_bits）が決定される。その計算式を以下に示す。
max_bits＝mean_bits＋add_bits (5)
【００４４】
そうして利用可能な最大ビット数（max_bits）が算出されると、ステップ３ｃにおいてスケールファクタ等の初期設定が行われ、しかる後ステップ３ｄにより量子化及び可変長符号化処理が実行される。このステップ３ｄでは、実際の周波数サンプルを現在のスケールファクタに基づき量子化する処理が行われる。そして、その量子化値がハフマン符号化部３４により可変長符号化され、これにより符号化に必要なビット数が求められる。
【００４５】
この求められた必要ビット数は上記利用可能な最大ビット数（max_bits）と比較される。この比較の結果、符号化に必要な必要ビット数が利用可能な最大ビット数（max_bits）を超えている場合には、符号化に必要な必要ビット数が利用可能な最大ビット数（max_bits）以下になるまで量子化ステップサイズを大きくして必要ビット数を抑制する。
【００４６】
ステップ３ｅでは、上記ステップ３ｄにより決定された量子化値と元の周波数サンプルとが比較され、スケールファクタバンドごとの量子化歪みが求められる。ステップ３ｍでは、上記求められた量子化歪みがマスキングレベルを超えているスケールファクタバンド数（over）が求められる。そして、この求められたoverが０であるか否か（over=0?）がステップ３ｆで判定される。
【００４７】
この判定の結果、overが０であれば、すなわちすべてのスケールファクタバンドの量子化歪みがマスキングレベル以下であれば、繰り返しループから抜け出してステップ３ｇに移行する。そして、このステップ３ｇにおいて、このときのスケールファクタが保存される。また、符号化に使用されたビット数がステップ３ｈで計算され、この使用ビット数と利用可能な最大ビット数との差（未使用ビット数）が、次フレーム以降の符号化のためにビット保存に蓄積される。
【００４８】
一方、上記ステップ３ｆの比較の結果、あるスケールファクタバンドで量子化歪みがマスキングレベルを超えていた（over!=0）とする。この場合には、ステップ３ｉによりスケールファクタの修正が可能であるか否かが判定される。そして、すべてのスケールファクタが上限を越えずに修正可能ならば、ステップ３ｊによりこれらのスケールファクタを増加するように修正が行われる。そして、この修正後にステップ３ｄに戻り、以後すべてのスケールファクタバンドの量子化歪みが許容範囲内になるまで、ステップ３ｄ乃至ステップ３ｊによる量子化および可変長符号化処理の繰り返しループが実行される。
【００４９】
上記繰り返しループにおいては、設定ビットレートが高く利用可能な最大ビット数が十分にある場合には、量子化歪みが許容範囲内のスケールファクタの組み合わせが見つかる。これに対し、設定ビットレートが低く利用可能なビット数が少ない場合には、上記ループが繰り返されてスケールファクタが上限値に達してしまう。ステップ３ｉにおいてスケールファクタが上限値を超えて修正不可能と判定された場合には、ステップ３ｋにより準最適なスケールファクタの組み合わせが選択されて保存される。またそれと共に、符号化に使用したビット数がステップ３ｈで計算され、この使用ビット数と利用可能な最大ビット数との差（未使用ビット数）が、次フレーム以降の符号化のためにバッファ制御部３３のビット保存に蓄積される。
【００５０】
以上述べたように第１の実施形態では、繰り返しループ３において、先ずステレオオーディオ信号ＡＳの左右各チャネル間のパワー比（x_ratio[ch]）を算出する。そして、このチャネル間のパワー比（x_ratio[ch]）をもとに付加ビット数（add_bits）を補正することにより、ステレオオーディオ信号ＡＳのフレームごとに利用可能な最大ビット数（max_bits）を決定する。そして、この決定された利用可能な最大ビット数（max_bits）の範囲内で、ステレオオーディオ信号ＡＳのフレームに対し量子化及び可変長符号化を行い、量子化歪みがマスキングレベル以下となるスケールファクタを求めている。
【００５１】
したがって、利用可能な最大ビット数（max_bits）が入力オーディオ信号ＡＳの左右両チャネル間のパワー比（x_ratio[ch]）をもとに制御される。このため、設定ビットレートが低く利用可能な最大ビット数（max_bits）が十分でない場合でも、先に符号化されるチャネルに多くのビットが割り当てられて、後に符号化されるチャネルに利用可能なビット数が不足気味になる不具合は解消される。この結果、先に符号化されるチャネルは勿論のこと、後から符号化されるチャネルについても、品質の劣化を生じることなく符号化を行うことが可能となり、これによりチャネル間の符号化品質のばらつきを低減することができる。
【００５２】
（第２の実施形態）
この発明の第２の実施形態は、量子化／可変長符号化／バッファ制御の繰り返しループにおいて、スケールファクタを繰り返し修正して量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせを求める際に、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせが見つからなかった場合には、量子化歪みがマスキングレベルを超えるスケールファクタバンド数が最小となるときのスケールファクタをもとに、準最適なスケールファクタの組み合わせを求めるものである。
【００５３】
図４は、この第２の実施形態に係わるオーディオ符号化器による、量子化／可変長符号化／バッファ制御の繰り返しループのサブルーチンの処理手順と処理内容を示すフローチャートである。なお、同図において前記図３と同一部分には同一符号を付して詳しい説明は省略する。また、オーディオ符号化器の構成についても図１と同一なので、ここでの説明は省略する。
【００５４】
ステップ３ｍにおいて量子化歪みがマスキングレベルを超えているスケールファクタバンド数（over）が算出されると、続いてステップ３ｎにおいて、上記ステップ３ｍで求められたoverがスケールファクタバンド数の最小値（over_min）と比較される。この比較の結果、上記求められたoverがover_minより小さければ、ステップ３ｏにおいてスケールファクタバンド数の最小値（over_min）がいま求められたoverに更新される。またそれと共にステップ３ｏでは、現在のスケールファクタが準最適なスケールファクタとして保存される。
【００５５】
続いてステップ３ｆでは、上記ステップ３ｍで求められたoverが０であるか否か（over=0?）が判定される。この判定の結果、over＝０、つまりすべてのスケールファクタバンドの量子化歪みがマスキングレベル以下であれば、繰り返しループから抜け出してステップ３ｇに移行する。そして、このステップ３ｇにおいて、このときのスケールファクタが保存される。また、符号化に使用されたビット数がステップ３ｈで計算され、この使用ビット数と利用可能な最大ビット数との差（未使用ビット数）が、次フレーム以降の符号化のためにビット保存に蓄積される。
【００５６】
一方、上記ステップ３ｆの比較の結果、あるスケールファクタバンドで量子化歪みがマスキングレベルを超えていた（over!=0）とする。この場合には、ステップ３ｉによりスケールファクタの修正が可能であるか否かが判定される。そして、すべてのスケールファクタが上限を越えずに修正可能ならば、ステップ３ｊによりこれらのスケールファクタを増加するように修正が行われる。そして、この修正後にステップ３ｄに戻り、以後すべてのスケールファクタバンドの量子化歪みが許容範囲内になるまで、ステップ３ｄ乃至ステップ３ｊによる量子化及び可変長符号化処理の繰り返しループが実行される。
【００５７】
上記繰り返しループにおいて、設定ビットレートが高く利用可能な最大ビット数が十分にある場合には、量子化歪みが許容範囲内のスケールファクタの組み合わせが見つかる。これに対し、設定ビットレートが低く利用可能なビット数が少ない場合には、上記ループが繰り返されてスケールファクタが上限値に達してしまう。ステップ３ｉにおいてスケールファクタが上限値を超えて修正不可能と判定された場合には、ステップ３ｐにより準最適なスケールファクタの組み合わせが元の値に復帰される。またそれと共に、符号化に使用したビット数がステップ３ｈで計算され、この使用ビット数と利用可能な最大ビット数との差（未使用ビット数）が、次フレーム以降のフレームの符号化のためにビット保存に蓄積される。
【００５８】
以上述べたように第２の実施形態では、ステップ３ｎ、ステップ３ｏ、ステップ３ｆ、ステップ３ｉ及びステップ３ｐを実行することにより、ループ内において量子化歪みがマスキングレベルを超えているスケールファクタバンド数（over）が最小（over_min）となるときのスケールファクタが保存され、すべてのスケールファクタバンドにおいて量子化歪みをマスキングレベル以下にするスケールファクタの組み合わせが見つからない場合には、上記保存されたスケールファクタから準最適なスケールファクタの組み合わせが得られる。
【００５９】
このため、設定ビットレートが低く、かつ利用可能な最大ビット数が十分でないことが原因で、すべてのスケールファクタバンドにおいて量子化歪みをマスキングレベル以下にするスケールファクタの組み合わせが見つからない場合でも、準最適なスケールファクタを用いて符号化が行われる。
【００６０】
したがって、第２の実施形態によれば、第１の実施形態で述べた左右両チャネル間における利用可能最大ビット数（max_bits）の偏重の解消と相俟って、入力オーディオ信号ＡＳのオーディオ符号化の品質劣化をさらに効果的に回避することができる。また、第２の実施形態であれば、スケールファクタバンド数を比較するだけの比較的簡単な処理により、準最適なスケールファクタの組み合わせが得られる利点がある。
【００６１】
（第３の実施形態）
この発明の第３の実施形態は、マスキングレベルを満たさない帯域を少なくするためにスケールファクタバンド幅は非一様、つまり低域が狭く広域になるほど広くなるように設定される点に着目する。そして、量子化／可変長符号化／バッファ制御の繰り返しループにおいて、スケールファクタを繰り返し修正して量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせを求める際に、量子化歪みがマスキングレベルを超えるスケールファクタバンドのバンド幅の合計が最小となるときのスケールファクタを保持しておく。そして、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせが見つからなかった場合に、上記保持されているスケールファクタをもとに準最適なスケールファクタの組み合わせを求めるようにしたものである。
【００６２】
図５は、この第３の実施形態に係わるオーディオ符号化器による、量子化／可変長符号化／バッファ制御の繰り返しループの処理手順と処理内容を示すフローチャートである。なお、同図において前記図３と同一部分には同一符号を付して詳しい説明は省略する。また、オーディオ符号化器の構成についても図１と同一なので、ここでの説明は省略する。
【００６３】
ステップ３ｍにおいて量子化歪みがマスキングレベルを超えているスケールファクタバンド数（over）が算出されると、ステップ５ａでは、上記量子化歪みがマスキングレベルを超えているすべてのスケールファクタバンドのバンド幅の合計（width）が算出される。次にステップ５ｂにおいて、上記算出されたwidthが最小値（width_min）と比較される。この比較の結果、width＜width_minであれば、ステップ５ｃによりwidth_minが上記算出されたwidthに更新される。またそれと共に、ステップ５ｃにおいて、現在のスケールファクタが準最適なスケールファクタとして保存される。
【００６４】
続いてステップ３ｆでは、上記ステップ３ｍで求められたoverが０であるか否か（over=0?）が判定される。この判定の結果、over＝０、つまりすべてのスケールファクタバンドの量子化歪みがマスキングレベル以下であれば、繰り返しループから抜け出してステップ３ｇに移行する。そして、このステップ３ｇにおいて、このときのスケールファクタが保存される。また、符号化に使用されたビット数がステップ３ｈで計算され、この使用ビット数と利用可能な最大ビット数との差（未使用ビット数）が、次フレーム以降の符号化のためにビット保存に蓄積される。
【００６５】
一方、上記ステップ３ｆの比較の結果、あるスケールファクタバンドで量子化歪みがマスキングレベルを超えていた（over!=0）とする。この場合には、ステップ３ｉによりスケールファクタの修正が可能であるか否かが判定される。そして、すべてのスケールファクタが上限を越えずに修正可能ならば、ステップ３ｊによりこれらのスケールファクタを増加するように修正が行われる。そして、この修正後にステップ３ｄに戻り、以後すべてのスケールファクタバンドの量子化歪みが許容範囲内になるまで、ステップ３ｄ乃至ステップ３ｊによる量子化及び可変長符号化処理の繰り返しループが実行される。
【００６６】
上記繰り返しループにおいて、設定ビットレートが高く利用可能な最大ビット数が十分にある場合には、量子化歪みが許容範囲内のスケールファクタの組み合わせが見つかる。これに対し、設定ビットレートが低く利用可能なビット数が少ない場合には、上記ループが繰り返されてスケールファクタが上限値に達してしまう。ステップ３ｉにおいてスケールファクタが上限値を超えて修正不可能と判定された場合には、ステップ５ｄにより準最適なスケールファクタの組み合わせが元の値に復帰される。またそれと共に、符号化に使用したビット数がステップ３ｈで計算され、この使用ビット数と利用可能な最大ビット数との差（未使用ビット数）が、次フレーム以降のフレームの符号化のためにビット保存に蓄積される。
【００６７】
以上のように第３の実施形態では、ステップ５ａ、ステップ５ｂ及びステップ５ｃを実行することにより、ループ内において量子化歪みがマスキングレベルを超えているすべてのスケールファクタバンドのバンド幅の合計（width）が最小（width_min）となるときのスケールファクタが保存される。そして、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせが見つからなかった場合には、上記保存されたスケールファクタが準最適なスケールファクタの組み合わせとして量子化及び符号化の処理に供される。
【００６８】
このため、前記第２の実施形態と同様に、設定ビットレートが低く、かつ利用可能な最大ビット数が十分でないことが原因で、すべてのスケールファクタバンドにおいて量子化歪みをマスキングレベル以下にするスケールファクタの組み合わせが見つからない場合でも、準最適なスケールファクタを用いて符号化が行われることになる。
【００６９】
したがって、第３の実施形態においても、第１の実施形態で述べた左右両チャネル間における利用可能最大ビット数（max_bits）の偏重の解消と相俟って、入力オーディオ信号ＡＳのオーディオ符号化の品質劣化を効果的に回避することができる。また、バンド幅の狭いスケールファクタバンド、つまりオーディオ信号の低域成分を重要視した量子化及び符号化処理を行うことが可能となる。
【００７０】
（第４の実施形態）
この発明の第４の実施形態は、量子化／可変長符号化／バッファ制御の繰り返しループにおいて、スケールファクタを繰り返し修正して量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせを求める際に、量子化歪みがマスキングレベルを超えるスケールファクタバンドの量子化歪みの誤差の合計が最小のときのスケールファクタを保持しておく。そして、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせが見つからなかった場合には、前記保持されているスケールファクタをもとに準最適なスケールファクタの組み合わせを求めるものである。
【００７１】
図６は、この第４の実施形態に係わるオーディオ符号化器による、量子化／可変長符号化／バッファ制御の繰り返しループの処理手順と処理内容を示すフローチャートである。なお、同図において前記図３と同一部分には同一符号を付して詳しい説明は省略する。また、オーディオ符号化器の構成についても図１と同一なので、ここでの説明は省略する。
【００７２】
ステップ３ｍにおいて量子化歪みがマスキングレベルを超えているスケールファクタバンド数（over）が算出されると、ステップ６ａでは、上記量子化歪みがマスキングレベルを超えているすべてのスケールファクタバンドの、量子化歪みとマスキングレベルとの差の絶対値（｜量子化歪み−マスキングレベル｜）の合計（maskerr）が算出される。すなわち、量子化歪みがマスキングレベルを超えるスケールファクタバンドの量子化歪みの誤差の合計が算出される。次にステップ６ｂにおいて、上記算出されたmaskerrが最小値（maskerr_min）と比較される。この比較の結果、maskerr＜maskerr_minであれば、ステップ６ｃによりmaskerr_minが上記算出されたmaskerrに更新される。またそれと共に、ステップ６ｃにおいて、現在のスケールファクタが準最適なスケールファクタとして保存される。
【００７３】
続いてステップ３ｆでは、上記ステップ３ｍで求められたoverが０であるか否か（over=0?）が判定される。この判定の結果、over＝０、つまりすべてのスケールファクタバンドの量子化歪みがマスキングレベル以下であれば、繰り返しループから抜け出してステップ３ｇに移行する。そして、このステップ３ｇにおいて、このときのスケールファクタが保存される。また、符号化に使用されたビット数がステップ３ｈで計算され、この使用ビット数と利用可能な最大ビット数との差（未使用ビット数）が、次フレーム以降の符号化のためにビット保存に蓄積される。
【００７４】
一方、上記ステップ３ｆの比較の結果、あるスケールファクタバンドで量子化歪みがマスキングレベルを超えていた（over!=0）とする。この場合には、ステップ３ｉによりスケールファクタの修正が可能であるか否かが判定される。そして、すべてのスケールファクタが上限を越えずに修正可能ならば、ステップ３ｊによりこれらのスケールファクタを増加するように修正が行われる。そして、この修正後にステップ３ｄに戻り、以後すべてのスケールファクタバンドの量子化歪みが許容範囲内になるまで、ステップ３ｄ乃至ステップ３ｊによる量子化及び可変長符号化処理の繰り返しループが実行される。
【００７５】
上記繰り返しループにおいて、設定ビットレートが高く利用可能な最大ビット数が十分にある場合には、量子化歪みが許容範囲内のスケールファクタの組み合わせが見つかる。これに対し、設定ビットレートが低く利用可能なビット数が少ない場合には、上記ループが繰り返されてスケールファクタが上限値に達してしまう。ステップ３ｉにおいてスケールファクタが上限値を超えて修正不可能と判定された場合には、ステップ６ｄにより準最適なスケールファクタの組み合わせが元の値に復帰される。またそれと共に、符号化に使用したビット数がステップ３ｈで計算され、この使用ビット数と利用可能な最大ビット数との差（未使用ビット数）が、次フレーム以降のフレームの符号化のためにビット保存に蓄積される。
【００７６】
以上のように第４の実施形態では、ステップ６ａ、ステップ６ｂ及びステップ６ｃを実行することにより、ループ内において量子化歪みがマスキングレベルを超えているすべてのスケールファクタバンドの｜量子化歪み−マスキングレベル｜の合計（maskerr）が最小（maskerr_min）となるときのスケールファクタが保存される。そして、すべてのスケールファクタバンドにおいて量子化歪みがマスキングレベル以下になる最適なスケールファクタの組み合わせが見つからなかった場合に、上記保存されたスケールファクタが準最適なスケールファクタの組み合わせとして量子化及び符号化の処理に供される。
【００７７】
このため、前記第２及び第３の実施形態と同様に、設定ビットレートが低く、かつ利用可能な最大ビット数が十分でないことが原因で、すべてのスケールファクタバンドにおいて量子化歪みをマスキングレベル以下にするスケールファクタの組み合わせが見つからない場合でも、準最適なスケールファクタを用いて符号化が行われることになる。
【００７８】
したがって、第４の実施形態においても、第１の実施形態で述べた左右両チャネル間における利用可能最大ビット数（max_bits）の偏重の解消と相俟って、入力オーディオ信号ＡＳのオーディオ符号化の品質劣化を効果的に回避することができる。また、量子化歪みが最適化され、量子化歪みが最小となるように量子化及び符号化処理が行われることになる。
【００７９】
（その他の実施形態）
上記第１の実施形態では、左右両チャネル間のパワー比を求め、このパワー比をもとにフレームごとに利用可能な最大ビット数を制御するようにした。しかし、それに限るものではなく、左右両チャネル間の心理聴覚エントロピーの比を求め、この求められた心理聴覚エントロピーの比をもとにフレームごとに利用可能な最大ビット数を制御するようにしてもよい。このとき、上記心理聴覚エントロピーの比は、以下の式により求めることが可能である。
sum_pe ＝ x_pe[0] ＋ x_pe[1] (6)
x_ratio[ch] ＝ x_pe[ch] ／ sum_pe ch＝0 ，1 (7)
【００８０】
なお、チャネル間の情報量の相違を検出する手段は、パワー比や心理聴覚エントロピーの比以外に、パワーの差や心理聴覚エントロピーの差を求めるようにしてもよく、さらにはその他の要素の相違を検出するようにしてもよい。
【００８１】
また、前記各実施形態では、左右２チャネルからなるオーディオ信号を符号化する場合を例にとって説明したが、３チャネル以上のオーディオ信号を符号化する場合にもこの発明は適用可能である。
【００８２】
さらに、前記各実施形態では、繰り返しループの処理をプログラムの実行により実現する場合を例にとって説明したが、ハードウエアで実現することも可能である。
【００８３】
その他、オーディオ符号化器の構成、繰り返しループの処理手順とその内容、チャネル間の情報量の相違を検出する手段などについても、この発明の要旨を逸脱しない範囲で種々変形して実施できる。
【００８４】
【発明の効果】
以上詳述したようにこの発明では、複数チャネルからなるオーディオ信号を、可変長符号化方式とビット保存技術を使用して符号化するオーディオ符号化器とその符号化プログラムにおいて、入力された上記オーディオ信号の各チャネル間における情報量の相違を検出し、その検出結果に基づいて、上記オーディオ信号の各フレームに割り当てる利用可能ビット数を補正する。そして、上記各チャネルの信号に対し、上記補正された利用可能ビット数の範囲内で量子化処理及び可変長符号化処理を量子化歪みが所定量以下になるまで繰り返し実行し、この量子化処理及び可変長符号化処理により得られた結果をオーディオビットストリームにフォーマット化するようにしている。
【００８５】
したがってこの発明によれば、ステレオ信号のように複数チャネルを有するオーディオ信号を符号化する場合に、設定されたビットレートが低く利用可能な最大ビット数が十分でない場合であっても、複数のチャネルの各々にそれぞれ適当なビット数を割り当てることができ、これによりチャネル間における符号化品質のばらつきを低減することができるオーディオ符号化器とその符号化処理プログラムを提供することができる。
【図面の簡単な説明】
【図１】この発明に係わるオーディオ符号化器の第１の実施形態を示す機能ブロック図。
【図２】図１に示したオーディオ符号化器による繰り返しループの全体の処理手順と処理内容を示すフローチャート。
【図３】図１に示したオーディオ符号化器による繰り返しループのサブルーチンの処理手順と処理内容を示すフローチャート。
【図４】この発明に係わるオーディオ符号化器の第２の実施形態による量子化／可変長符号化／バッファ制御の繰り返しループの処理手順と処理内容を示すフローチャート。
【図５】この発明に係わるオーディオ符号化器の第３の実施形態による量子化／可変長符号化／バッファ制御の繰り返しループのサブルーチンの処理手順と処理内容を示すフローチャート。
【図６】この発明に係わるオーディオ符号化器の第４の実施形態による量子化／可変長符号化／バッファ制御の繰り返しループのサブルーチンの処理手順と処理内容を示すフローチャート。
【符号の説明】
１…ハイブリッドフィルタバンク、２…心理聴覚分析部、３…繰り返しループ、４…ビットストリーム形成部、１１…サブバンド分析フィルタバンク、１２…適応ブロック長ＭＤＣＴ、１３…折り返し歪み削減バタフライ、２１…高速フーリエ変換部（ＦＦＴ）、２２…非予測可能性測定部、２３…ＳＭＲ計算部、２４…心理聴覚エントロピー評価部、３１…非線形量子化部、３２…スケールファクタ計算部、３３…バッファ制御部、３４…ハフマン符号化部、３５…サイド情報符号化部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an audio encoder provided in, for example, an MPEG (Moving Picture Coding Experts Group) audio recorder, and an encoding processing program used for encoding an audio signal in the encoder.
[0002]
[Prior art]
MPEG audio recorders use audio encoders that employ psychoacoustic analysis. This type of encoder first converts an input audio signal from a time domain signal to a frequency domain signal by a time / frequency conversion unit. At the same time, the psychoacoustic analysis unit calculates a signal-to-mask ratio (SMR) and psychoacoustic entropy for each scale factor band using fast Fourier transform (FFT) analysis or the like. Next, a quantization error masking level based on psychoacoustic perception is calculated from the SMR for each scale factor band by a repetition loop of quantization / variable length coding / buffer control. Then, the quantization process and the variable length coding process are repeatedly executed within the range of the usable number of bits, thereby obtaining an optimum scale factor so that the quantization error is less than the masking level.
[0003]
Further, since the variable length coding method is used as the coding method in the above repeating loop, the coding amount generated in each frame changes. For this reason, bit reservoir technology is used. That is, in a frame with a small amount of encoding, the surplus bits are stored in bit storage within a range not exceeding the maximum storage amount. The accumulated bits are used in a frame that requires many bits for encoding, such as the rising edge of an audio signal. Whether many bits are required is determined based on psychoacoustic entropy.
[0004]
The scale factor obtained by the repetition loop is input to the bit stream forming unit. The bit stream forming unit forms an audio bit stream of a predetermined format based on the code of the optimum scale factor obtained by the above loop, the variable length code of the quantized value at that time, and the side information. To do. Then, the formed audio bit stream is used for transmission or storage in a recording medium (for example, see Non-Patent Document 1).
[0005]
By the way, in the repetition loop of the quantization / variable length coding / buffer control, the following processing is performed. That is, first, the masking level of the quantization error based on psychoacoustics is calculated for each scale factor band from the SMR for each scale factor band described above. Subsequently, the number of bits to be added (add_bits) is obtained in consideration of the number of bits stored in the bit storage and psychoacoustic entropy. The calculated number of additional bits (add_bits) is added to the average number of bits per frame (mean_bits) based on the set bit rate, thereby determining the maximum number of available bits (max_bits). .
[0006]
Thus, when the maximum number of available bits (max_bits) is determined, a process for initializing the scale factor and the like is performed, and then the actual frequency samples are quantized based on the current scale factor. . The quantized value is subjected to variable length coding, whereby the required number of bits is obtained. If the required number of bits exceeds the maximum number of available bits, the quantization step size is changed to a large value until the required number of bits is less than or equal to the available number of bits. The number of bits is suppressed.
[0007]
Subsequently, the quantization value obtained as described above is compared with the original frequency sample, and the quantization distortion for each scale factor band is obtained. Then, the number of scale factor bands (over) at which the quantization distortion exceeds the masking level is obtained. As a result, if the quantization distortion is less than the masking level in all scale factor bands (over = 0), the loop factor is exited and the current scale factor is stored, and the number of bits used for encoding is calculated. Is done. If there is a difference (the number of unused bits) between the calculated number of used bits and the maximum number of available bits, the unused bits are used for the encoding process of the subsequent frames. Accumulated in bit save.
[0008]
On the other hand, if the quantization distortion in a certain scale factor band exceeds the masking level (over! = 0), it is determined whether or not all the scale factors can be corrected without exceeding the upper limit. And if it can be modified, those scale factors are increased. Thereafter, the above loop is repeatedly executed.
[0009]
When the set bit rate is high and the maximum number of available bits is sufficient, a combination of scale factors with a quantization distortion below an acceptable range is found. When the number of available bits is small, the loop is repeatedly executed and the scale factor reaches the upper limit value. In this case, it is determined that the scale factor cannot be corrected, and a suboptimal combination of scale factors is stored. Then, the number of bits used for encoding is calculated, and if there is a difference (number of unused bits) between the number of used bits and the maximum number of available bits, the unused bits will be used after the next frame. Are stored in the bit storage for use in the encoding process of the frame.
[0010]
When the input audio signal is a stereo signal, the above-described quantization / variable length encoding / buffer control loop process is repeatedly performed for each of the left and right channel signals.
[0011]
[Non-Patent Document 1]
Comprehensive multimedia book "MPEG" The Institute of Image Information and Television Engineers
Chapter 6 MPEG Audio Coding (especially P141 to P153)
[0012]
[Problems to be solved by the invention]
However, in such a conventional audio encoder, for each frame of the input audio signal, the maximum number of bits that can be used is sequentially calculated from the number of bits stored in the bit storage, and the calculated number of bits is calculated. Allocated for encoding processing of the frame. This processing is performed in such a manner that the processing result for one channel is succeeded to the processing for the other channel even when the input audio signal is a stereo signal.
[0013]
For this reason, when the set bit rate is low and the maximum usable number of bits is not sufficient, a larger number of bits are assigned to the channel to be encoded first among the left and right channels of the stereo signal. As a result, there is a problem that the encoding quality of a channel to be encoded later tends to deteriorate.
[0014]
The present invention has been made paying attention to the above circumstances, and its purpose is to encode an audio signal having a plurality of channels such as a stereo signal, even when the set bit rate is low. Therefore, it is possible to provide an audio encoder and an encoding processing program thereof that can allocate an appropriate number of bits to each of the channels, thereby reducing variations in encoding quality between channels.
[0015]
[Means for Solving the Problems]
In order to achieve the above object, the present invention provides an audio encoder that encodes an audio signal composed of a plurality of channels using a variable-length encoding scheme and a bit storage technique, and an encoding program for the audio encoder. A difference in the amount of information between each channel of the audio signal is detected, and the number of usable bits assigned to each frame of the audio signal is corrected based on the detection result. Then, the process of quantizing and variable-length coding the audio signal of each channel based on the scale factor is repeated until the quantization distortion falls below the masking level within the range of the corrected usable number of bits. The result obtained by the quantization process and the variable length coding process is formatted into an audio bitstream.
[0016]
Specifically, the power ratio between the channels of the audio signal is detected, and the number of usable bits determined by the bit assignment is corrected based on the detected power ratio.
[0017]
As another means, it is possible to detect a ratio between channels of psychoacoustic entropy obtained from an audio signal by psychoacoustic analysis, and to determine whether the bit allocation is determined based on the ratio of the detected psychoacoustic entropy. Correct the number of bits.
[0018]
Therefore, according to the present invention, prior to the repetition loop of quantization / variable length coding / buffer control, a difference in information amount between channels of an audio signal is detected, and is determined by bit allocation based on the detection result. The number of available bits is corrected. For this reason, even when the set bit rate is low and the maximum number of available bits is not sufficient, an appropriate number of bits is assigned to each of the plurality of channels. Therefore, not only the channel encoded earlier but also the channel encoded later can be encoded without degrading the quality, thereby reducing the encoding quality variation between channels. Can be reduced.
[0019]
In addition, in the above iterative loop, when the scale factor is repeatedly corrected to obtain an optimal combination of scale factors in which the quantization distortion is less than the masking level, the quantization distortion is less than the masking level in all scale factor bands. When an optimal combination of scale factors is not found, a sub-optimal combination of scale factors is obtained.
[0020]
The following means can be considered as means for obtaining a suboptimal combination of scale factors.
The first means holds the scale factor when the number of scale factor bands in which the quantization distortion exceeds the masking level is minimized. If no optimal scale factor combination with quantization distortion below the masking level is found in all scale factor bands, a sub-optimal scale factor combination is obtained based on the stored scale factors. Is.
[0021]
The second means is that the sum of values weighted according to the bandwidth of the scale factor band in which the quantization distortion exceeds the masking level, that is, the sum of the bandwidth of the scale factor band in which the quantization distortion exceeds the masking level is the minimum. The scale factor when it becomes is retained. If no optimal scale factor combination with quantization distortion below the masking level is found in all scale factor bands, a sub-optimal scale factor combination is obtained based on the stored scale factors. Is.
The third means is the sum of the difference between the quantization distortion and the masking level in the scale factor band in which the quantization distortion exceeds the masking level, that is, the error in the quantization distortion in the scale factor band in which the quantization distortion exceeds the masking level. The scale factor when the sum is minimum is retained. If an optimal combination of scale factors with quantization distortion below the masking level is not found in all scale factor bands, a sub-optimal combination of scale factors is selected based on the stored scale factors. It is what you want.
[0022]
As described above, by preparing a sub-optimal scale factor, even when the set bit rate is low and the optimum combination of scale factor bands with quantization distortion below the masking level is not found in all scale factor bands, Thus, it is possible to suppress the quality deterioration of the encoded audio signal.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
FIG. 1 is a functional block diagram showing a first embodiment of an audio encoder according to the present invention. The audio encoder includes a hybrid filter bank 1, a psychoacoustic analysis unit 2, a repetitive loop 3, and a bit stream formation unit 4.
[0024]
First, the psychoacoustic analysis unit 2 includes a fast Fourier transform (FFT) unit 21, a non-predictability measurement unit 22, a signal-to-mask ratio (SMR) calculation unit 23, A psychoacoustic entropy evaluation unit 24;
[0025]
For example, a PCM stereo audio signal AS that is 16-bit linearly quantized by a PCM (Pulse Code Modulation) encoding unit (not shown) is input to the psychoacoustic analysis unit 2. The input PCM stereo audio signal AS is subjected to FFT analysis by the fast Fourier transform unit 21, and then unpredictability is measured by the non-predictability measurement unit 22. The SMR calculation unit 23 calculates an SMR for each subband (scale factor band) of the input PCM stereo audio signal AS based on the FFT analysis. The psychoacoustic entropy evaluation unit 24 obtains psychoacoustic entropy based on the calculated SMR.
[0026]
For the model of psychoacoustic analysis, Sugiyama: “Highly efficient encoding of acoustic signals”, serial lecture “Basic Technology of Digital Television Broadcasting”, TV Magazine, 48, 4, pp.447-454 (Apr.1994) Is described in detail.
[0027]
The hybrid filter bank 1 includes a subband analysis filter bank 11, an adaptive block length modified discrete cosine transform (MDCT) unit 12, and a aliasing reduction butterfly 13.
[0028]
The input PCM stereo audio signal AS is converted from a time domain signal to a frequency domain signal by the subband analysis filter bank 11 and then divided into, for example, 32 subband signals (scale factor band). . For example, a 512-tap polyphase filter bank (PFB) is used for the subband analysis.
[0029]
The adaptive block length MDCT unit 12 is for suppressing the pre-echo, and maps each of the divided subband signals to a finer spectral line. At this time, the block length of the adaptive block length MDCT unit 12 is determined based on the psychoacoustic entropy using the unpredictability obtained by the psychoacoustic entropy evaluation unit 24. The aliasing distortion reduction butterfly 13 removes aliasing distortion in the frequency domain included in the mapping signal obtained by the adaptive block length MDCT unit 12.
[0030]
The iterative loop 3 includes a nonlinear quantization unit 31, a scale factor calculation unit 32, a buffer control unit 33, a Huffman coding unit 34, and a side information coding unit 35.
[0031]
The scale factor calculation unit 32 sets the masking level of the quantization error based on the psychoacoustic model based on the signal-to-mask ratio (SMR) for each scale factor band obtained by the SMR calculation unit 23 for each scale factor band. To calculate. Further, the maximum number of bits that can be used for nonlinear quantization is determined based on the number of bits stored in the bit storage of the buffer control unit 33 and the psychoacoustic entropy obtained by the psychoacoustic entropy evaluation unit 24. . Further, the scale factor calculation unit 32 calculates the power ratio between the left and right channels of the input audio signal prior to the determination of the maximum number of usable bits. The calculated power ratio is used to correct the maximum number of available bits. The corrected number of usable bits is given to the nonlinear quantization unit 31 for nonlinear quantization processing.
[0032]
The non-linear quantization unit 31 performs non-linear quantization of the mapping signal output from the aliasing distortion reduction butterfly 13 in accordance with the bit allocation by the scale factor calculation unit 32. The quantized value obtained by this non-linear quantization is variable length encoded by the Huffman encoder 34, and the required number of bits is obtained. The obtained necessary number of bits is compared with the maximum number of available bits, and the quantization step size is changed based on the comparison result.
[0033]
The above nonlinear quantization and variable length coding are accompanied by an iterative loop, and are repeated until the quantization error is below the masking level in all scale factor bands. When the quantization error is less than or equal to the masking level in all scale factor bands, the scale factor obtained at this time is used. In this iterative loop, the number of bits used for encoding is calculated, and the number of unused bits is stored in the bit storage of the buffer control unit 33 for encoding after the next frame.
[0034]
The side information encoding unit 35 encodes the scale factor calculated by the scale factor calculation unit 32 as side information together with the bit allocation information and the Huffman table after the end of the above repetition loop.
[0035]
The bit stream generation unit 4 multiplexes the header, the data encoded by the Huffman encoding unit 34, and the side information encoded by the side information encoding unit 35 in accordance with a predetermined format to generate a bit stream Form. Then, the generated bit stream is used for storage in a storage medium or for transmission to a communication path.
[0036]
Next, the processing operation by the repetition loop 3 of stereo signal quantization / variable length encoding / buffer control in the audio encoder configured as described above will be described. This processing operation is realized by causing a microcomputer or DSP (Digital Signal Processor) to execute the program.
[0037]
FIG. 2 is a flowchart showing the entire processing procedure and processing contents (main routine). In the repetition loop 3, first, in step 2a, the power ratio between the left and right channels of the input stereo audio signal AS is calculated. This power ratio is calculated by the following equation.
[Expression 1]

[0038]
In the repetition loop 3, next, the repetition loop processing of quantization / variable length coding / buffer control is sequentially performed on both the left and right channels of the input stereo audio signal AS.
[0039]
That is, in step 2b, the channel number ch is first initialized (ch = 0). Then, for the input audio signal of the channel corresponding to the initialized channel number ch = 0, for example, the left channel, the iterative loop process of quantization / variable length coding / buffer control is executed in step 2d.
[0040]
When the loop processing for the left channel is completed, the channel number ch is incremented (ch ++) in step 2e, and the channel corresponding to the incremented channel number ch = 1, for example, the right channel, is quantized in step 2d. / Repetitive loop processing of variable length coding / buffer control is executed. When the repetitive loop processing for both the left and right channels is completed, the end of this processing is confirmed in step 2c, and the repetitive loop processing ends.
[0041]
By the way, the iterative loop processing of quantization / variable length coding / buffer control for the left and right channels is performed as follows. FIG. 3 is a flowchart showing the processing procedure and processing contents of the subroutine.
[0042]
That is, in step 3a, the masking level of the quantization error based on psychoacoustic perception is calculated for each scale factor band from the SMR for each scale factor band calculated by the SMR calculation unit 3. Subsequently, in step 3b, the number of bits to be added (add_bits) is obtained in consideration of the number of bits accumulated in the bit storage of the buffer control unit 33 and the psychoacoustic entropy. The obtained additional bits (add_bits) are corrected as follows based on the power ratio (x_ratio [ch]) between the left and right channels previously calculated in step 2a.
add_bits = add_bits * x_ratio [ch] (4)
[0043]
The corrected number of additional bits (add_bits) is added to the average number of bits per frame (mean_bits) based on the set bit rate, thereby determining the maximum number of usable bits (max_bits). . The calculation formula is shown below.
max_bits ＝ mean_bits ＋ add_bits (5)
[0044]
When the maximum number of bits (max_bits) that can be used is calculated in this way, initial settings such as a scale factor are performed in step 3c, and then quantization and variable length coding processing are executed in step 3d. In step 3d, a process of quantizing an actual frequency sample based on the current scale factor is performed. Then, the quantized value is subjected to variable length coding by the Huffman coding unit 34, thereby obtaining the number of bits necessary for coding.
[0045]
The obtained necessary number of bits is compared with the maximum number of available bits (max_bits). As a result of this comparison, if the required number of bits required for encoding exceeds the maximum number of available bits (max_bits), the required number of bits required for encoding is less than the maximum number of available bits (max_bits). The required number of bits is suppressed by increasing the quantization step size until.
[0046]
In step 3e, the quantization value determined in step 3d is compared with the original frequency sample, and the quantization distortion for each scale factor band is obtained. In step 3m, the number of scale factor bands (over) at which the obtained quantization distortion exceeds the masking level is obtained. Then, it is determined in step 3f whether or not the obtained over is 0 (over = 0?).
[0047]
As a result of this determination, if over is 0, that is, if the quantization distortion of all scale factor bands is less than or equal to the masking level, the process exits the loop repeatedly and proceeds to step 3g. In step 3g, the scale factor at this time is stored. Also, the number of bits used for encoding is calculated in step 3h, and the difference (number of unused bits) between this used number of bits and the maximum available number of bits is stored as a bit for encoding after the next frame. Accumulated in.
[0048]
On the other hand, as a result of the comparison in step 3f, it is assumed that the quantization distortion exceeds the masking level (over! = 0) in a certain scale factor band. In this case, it is determined in step 3i whether the scale factor can be corrected. If all the scale factors can be corrected without exceeding the upper limit, the correction is performed so as to increase these scale factors in step 3j. Then, after this correction, the process returns to step 3d, and thereafter, the iteration loop of the quantization and variable length encoding process in steps 3d to 3j is executed until the quantization distortion of all scale factor bands falls within the allowable range.
[0049]
In the iterative loop, when the set bit rate is high and the maximum number of usable bits is sufficient, a combination of scale factors in which the quantization distortion is within an allowable range is found. On the other hand, when the set bit rate is low and the number of usable bits is small, the above loop is repeated and the scale factor reaches the upper limit value. If it is determined in step 3i that the scale factor exceeds the upper limit value and cannot be corrected, a suboptimal combination of scale factors is selected and stored in step 3k. At the same time, the number of bits used for encoding is calculated in step 3h, and the difference (number of unused bits) between the number of used bits and the maximum available number of bits is buffered for encoding after the next frame. Accumulated in the bit storage of the control unit 33.
[0050]
As described above, in the first embodiment, in the repetitive loop 3, first, the power ratio (x_ratio [ch]) between the left and right channels of the stereo audio signal AS is calculated. Then, the maximum number of bits (max_bits) that can be used for each frame of the stereo audio signal AS is determined by correcting the number of additional bits (add_bits) based on the power ratio (x_ratio [ch]) between the channels. . Then, quantization and variable length coding are performed on the frame of the stereo audio signal AS within the range of the determined maximum number of usable bits (max_bits), and a scale factor at which the quantization distortion is equal to or less than the masking level is determined. Seeking.
[0051]
Therefore, the maximum number of available bits (max_bits) is controlled based on the power ratio (x_ratio [ch]) between the left and right channels of the input audio signal AS. For this reason, even if the set bit rate is low and the maximum number of bits (max_bits) that can be used is not sufficient, many bits are allocated to the channel that is encoded first, and bits that can be used for the channel that is encoded later The problem of shortage of numbers is solved. As a result, not only the channel encoded earlier but also the channel encoded later can be encoded without degrading the quality. Variations can be reduced.
[0052]
(Second Embodiment)
In the second embodiment of the present invention, in the iteration loop of quantization / variable length coding / buffer control, when the scale factor is iteratively corrected to obtain the optimum combination of scale factors so that the quantization distortion is below the masking level. If the optimal combination of scale factors with quantization distortion below the masking level is not found in all scale factor bands, the scale at which the number of scale factor bands with quantization distortion exceeding the masking level is minimized A sub-optimal combination of scale factors is obtained based on the factors.
[0053]
FIG. 4 is a flowchart showing a processing procedure and processing contents of a subroutine of an iterative loop of quantization / variable length encoding / buffer control by the audio encoder according to the second embodiment. In the figure, the same parts as those in FIG. The configuration of the audio encoder is also the same as that shown in FIG.
[0054]
When the scale factor band number (over) in which the quantization distortion exceeds the masking level is calculated in step 3m, subsequently, in step 3n, the over obtained in step 3m is the minimum value of the scale factor band number (over_min). ). If the obtained over is smaller than over_min as a result of this comparison, the minimum value (over_min) of the number of scale factor bands is updated to the over just obtained in step 3o. At the same time, in step 3o, the current scale factor is stored as a suboptimal scale factor.
[0055]
Subsequently, in step 3f, it is determined whether or not over obtained in step 3m is 0 (over = 0?). As a result of this determination, if over = 0, that is, if the quantization distortion of all scale factor bands is less than or equal to the masking level, the process exits the loop repeatedly and proceeds to step 3g. In step 3g, the scale factor at this time is stored. Also, the number of bits used for encoding is calculated in step 3h, and the difference (number of unused bits) between this used number of bits and the maximum available number of bits is stored as a bit for encoding after the next frame. Accumulated in.
[0056]
On the other hand, as a result of the comparison in step 3f, it is assumed that the quantization distortion exceeds the masking level (over! = 0) in a certain scale factor band. In this case, it is determined in step 3i whether the scale factor can be corrected. If all the scale factors can be corrected without exceeding the upper limit, the correction is performed so as to increase these scale factors in step 3j. Then, after this correction, the process returns to step 3d, and thereafter, the repetition loop of the quantization and variable length encoding process in steps 3d to 3j is executed until the quantization distortion of all scale factor bands falls within the allowable range.
[0057]
When the set bit rate is high and the maximum number of usable bits is sufficient in the above-described repetitive loop, a combination of scale factors whose quantization distortion is within an allowable range is found. On the other hand, when the set bit rate is low and the number of usable bits is small, the above loop is repeated and the scale factor reaches the upper limit value. If it is determined in step 3i that the scale factor exceeds the upper limit value and cannot be corrected, the suboptimal combination of scale factors is returned to the original value in step 3p. At the same time, the number of bits used for encoding is calculated in step 3h, and the difference (number of unused bits) between the number of used bits and the maximum available number of bits is used for encoding the subsequent frames. Stored in the bit store.
[0058]
As described above, in the second embodiment, by executing Step 3n, Step 3o, Step 3f, Step 3i, and Step 3p, the number of scale factor bands in which the quantization distortion exceeds the masking level in the loop ( If the scale factor when over) is the minimum (over_min) is saved, and no combination of scale factors can be found that reduces the quantization distortion below the masking level in all scale factor bands, the above saved scale factor is used. A sub-optimal combination of scale factors is obtained.
[0059]
For this reason, even if a combination of scale factors that makes quantization distortion below the masking level is not found in all scale factor bands due to a low set bit rate and insufficient maximum number of available bits, Encoding is performed using an optimal scale factor.
[0060]
Therefore, according to the second embodiment, the audio coding of the input audio signal AS is coupled with the cancellation of the deviation of the maximum number of available bits (max_bits) between the left and right channels described in the first embodiment. Quality degradation can be avoided more effectively. In the second embodiment, there is an advantage that a suboptimal combination of scale factors can be obtained by a relatively simple process of simply comparing the number of scale factor bands.
[0061]
(Third embodiment)
The third embodiment of the present invention pays attention to the fact that the scale factor bandwidth is non-uniform in order to reduce the band that does not satisfy the masking level, that is, the band is set to become wider as the low band becomes narrower and wider. In an iterative loop of quantization / variable length coding / buffer control, when the scale factor is iteratively corrected to find an optimal combination of scale factors that results in a quantization distortion that is less than or equal to the masking level, the quantization distortion becomes the masking level. The scale factor when the sum of the bandwidths of the scale factor bands exceeding the minimum is kept. If no optimal scale factor combination with quantization distortion below the masking level is found in all scale factor bands, a sub-optimal scale factor combination is obtained based on the stored scale factors. It is what I did.
[0062]
FIG. 5 is a flowchart showing the processing procedure and processing contents of an iterative loop of quantization / variable length encoding / buffer control by the audio encoder according to the third embodiment. In the figure, the same parts as those in FIG. The configuration of the audio encoder is also the same as that shown in FIG.
[0063]
When the scale factor band number (over) in which the quantization distortion exceeds the masking level is calculated in step 3m, in step 5a, the bandwidths of all the scale factor bands whose quantization distortion exceeds the masking level are calculated. The total (width) is calculated. Next, in step 5b, the calculated width is compared with the minimum value (width_min). As a result of this comparison, if width <width_min, width_min is updated to the calculated width in step 5c. At the same time, in step 5c, the current scale factor is stored as a sub-optimal scale factor.
[0064]
Subsequently, in step 3f, it is determined whether or not over obtained in step 3m is 0 (over = 0?). As a result of this determination, if over = 0, that is, if the quantization distortion of all scale factor bands is less than or equal to the masking level, the process exits the loop repeatedly and proceeds to step 3g. In step 3g, the scale factor at this time is stored. Also, the number of bits used for encoding is calculated in step 3h, and the difference (number of unused bits) between this used number of bits and the maximum available number of bits is stored as a bit for encoding after the next frame. Accumulated in.
[0065]
On the other hand, as a result of the comparison in step 3f, it is assumed that the quantization distortion exceeds the masking level (over! = 0) in a certain scale factor band. In this case, it is determined in step 3i whether the scale factor can be corrected. If all the scale factors can be corrected without exceeding the upper limit, the correction is performed so as to increase these scale factors in step 3j. Then, after this correction, the process returns to step 3d, and thereafter, the repetition loop of the quantization and variable length encoding process in steps 3d to 3j is executed until the quantization distortion of all scale factor bands falls within the allowable range.
[0066]
When the set bit rate is high and the maximum number of usable bits is sufficient in the above-described repetitive loop, a combination of scale factors whose quantization distortion is within an allowable range is found. On the other hand, when the set bit rate is low and the number of usable bits is small, the above loop is repeated and the scale factor reaches the upper limit value. When it is determined in step 3i that the scale factor exceeds the upper limit value and cannot be corrected, the suboptimal combination of scale factors is returned to the original value in step 5d. At the same time, the number of bits used for encoding is calculated in step 3h, and the difference (number of unused bits) between the number of used bits and the maximum available number of bits is used for encoding the subsequent frames. Stored in the bit store.
[0067]
As described above, in the third embodiment, by executing Step 5a, Step 5b, and Step 5c, the total bandwidth (width) of all scale factor bands whose quantization distortion exceeds the masking level in the loop is executed. ) Is the minimum (width_min), the scale factor is stored. If an optimal combination of scale factors with quantization distortion below the masking level is not found in all scale factor bands, the stored scale factors are quantized and coded as a sub-optimal scale factor combination. It is used for processing.
[0068]
For this reason, as in the second embodiment, the scale that makes the quantization distortion equal to or lower than the masking level in all scale factor bands due to the low set bit rate and the insufficient maximum number of bits available. Even if a combination of factors is not found, encoding is performed using a sub-optimal scale factor.
[0069]
Therefore, also in the third embodiment, coupled with the cancellation of the deviation of the maximum usable number of bits (max_bits) between the left and right channels described in the first embodiment, the audio encoding of the input audio signal AS is performed. Quality degradation can be effectively avoided. In addition, it is possible to perform quantization and encoding processing that places importance on a narrow scale factor band, that is, a low frequency component of an audio signal.
[0070]
(Fourth embodiment)
According to the fourth embodiment of the present invention, in an iterative loop of quantization / variable length coding / buffer control, the scale factor is repeatedly corrected to obtain an optimal combination of scale factors with which the quantization distortion is below the masking level. In addition, the scale factor at the time when the sum of the quantization distortion errors in the scale factor band in which the quantization distortion exceeds the masking level is minimum is held. If an optimal combination of scale factors with quantization distortion below the masking level is not found in all scale factor bands, a sub-optimal combination of scale factors is selected based on the stored scale factors. It is what you want.
[0071]
FIG. 6 is a flowchart showing the processing procedure and processing contents of an iterative loop of quantization / variable length encoding / buffer control by the audio encoder according to the fourth embodiment. In the figure, the same parts as those in FIG. The configuration of the audio encoder is also the same as that shown in FIG.
[0072]
When the scale factor band number (over) in which the quantization distortion exceeds the masking level is calculated in step 3m, in step 6a, quantization of all the scale factor bands in which the quantization distortion exceeds the masking level is performed. The sum (maskerr) of the absolute value (| quantization distortion−masking level |) of the difference between the distortion and the masking level is calculated. That is, the sum of the quantization distortion errors in the scale factor band where the quantization distortion exceeds the masking level is calculated. Next, in step 6b, the calculated maskerr is compared with the minimum value (maskerr_min). As a result of this comparison, if maskerr <maskerr_min, maskerr_min is updated to the calculated maskerr in step 6c. At the same time, in step 6c, the current scale factor is stored as a sub-optimal scale factor.
[0073]
Subsequently, in step 3f, it is determined whether or not over obtained in step 3m is 0 (over = 0?). As a result of this determination, if over = 0, that is, if the quantization distortion of all scale factor bands is less than or equal to the masking level, the process exits the loop repeatedly and proceeds to step 3g. In step 3g, the scale factor at this time is stored. Also, the number of bits used for encoding is calculated in step 3h, and the difference (number of unused bits) between this used number of bits and the maximum available number of bits is stored as a bit for encoding after the next frame. Accumulated in.
[0074]
On the other hand, as a result of the comparison in step 3f, it is assumed that the quantization distortion exceeds the masking level (over! = 0) in a certain scale factor band. In this case, it is determined in step 3i whether the scale factor can be corrected. If all the scale factors can be corrected without exceeding the upper limit, the correction is performed so as to increase these scale factors in step 3j. Then, after this correction, the process returns to step 3d, and thereafter, the repetition loop of the quantization and variable length encoding process in steps 3d to 3j is executed until the quantization distortion of all scale factor bands falls within the allowable range.
[0075]
When the set bit rate is high and the maximum number of usable bits is sufficient in the above-described repetitive loop, a combination of scale factors whose quantization distortion is within an allowable range is found. On the other hand, when the set bit rate is low and the number of usable bits is small, the above loop is repeated and the scale factor reaches the upper limit value. If it is determined in step 3i that the scale factor exceeds the upper limit value and cannot be corrected, the suboptimal combination of scale factors is restored to the original value in step 6d. At the same time, the number of bits used for encoding is calculated in step 3h, and the difference (number of unused bits) between the number of used bits and the maximum available number of bits is used for encoding the subsequent frames. Stored in the bit store.
[0076]
As described above, in the fourth embodiment, by executing Step 6a, Step 6b, and Step 6c, | quantization distortion-masking of all scale factor bands whose quantization distortion exceeds the masking level in the loop. The scale factor when the sum of the levels | (maskerr) becomes the minimum (maskerr_min) is stored. Then, if the optimal combination of scale factors with quantization distortion below the masking level is not found in all scale factor bands, the stored scale factors are quantized and encoded as sub-optimal scale factor combinations. To be processed.
[0077]
For this reason, as in the second and third embodiments, because the set bit rate is low and the maximum number of available bits is not sufficient, the quantization distortion is less than the masking level in all scale factor bands. Even when a combination of scale factors to be found is not found, encoding is performed using a sub-optimal scale factor.
[0078]
Therefore, also in the fourth embodiment, in combination with the cancellation of the deviation of the maximum number of usable bits (max_bits) between the left and right channels described in the first embodiment, the audio encoding of the input audio signal AS is performed. Quality degradation can be effectively avoided. Further, the quantization distortion is optimized, and the quantization and encoding processing is performed so that the quantization distortion is minimized.
[0079]
(Other embodiments)
In the first embodiment, the power ratio between the left and right channels is obtained, and the maximum number of bits that can be used for each frame is controlled based on this power ratio. However, the present invention is not limited to this, and the ratio of psychoacoustic entropy between the left and right channels is obtained, and the maximum number of bits that can be used for each frame is controlled based on the obtained ratio of psychoacoustic entropy. Good. At this time, the ratio of the psychoacoustic auditory entropy can be obtained by the following equation.
sum_pe = x_pe [0] + x_pe [1] (6)
x_ratio [ch] = x_pe [ch] / sum_pe ch = 0, 1 (7)
[0080]
Note that the means for detecting the difference in the amount of information between channels may be to obtain a difference in power or psychoacoustic entropy in addition to the ratio of power ratio or psychoacoustic entropy. May be detected.
[0081]
Further, although cases have been described with the above embodiments as examples where audio signals consisting of two channels on the left and right are encoded, the present invention is also applicable when encoding audio signals of three or more channels.
[0082]
Further, although cases have been described with the above embodiments as an example where the processing of the repetitive loop is realized by execution of a program, it can also be realized by hardware.
[0083]
In addition, the configuration of the audio encoder, the processing procedure and contents of the iterative loop, the means for detecting the difference in the amount of information between channels, and the like can be variously modified without departing from the scope of the present invention.
[0084]
【The invention's effect】
As described above in detail, according to the present invention, an audio encoder that encodes an audio signal composed of a plurality of channels using a variable-length encoding scheme and a bit storage technique and the audio program input thereto are used. A difference in the amount of information between each channel of the signal is detected, and the number of usable bits allocated to each frame of the audio signal is corrected based on the detection result. Then, the quantization process and the variable length coding process are repeatedly performed on the signal of each channel within the range of the corrected usable number of bits until the quantization distortion becomes a predetermined amount or less. The result obtained by the variable length coding process is formatted into an audio bitstream.
[0085]
Therefore, according to the present invention, when an audio signal having a plurality of channels, such as a stereo signal, is encoded, even if the set bit rate is low and the maximum number of usable bits is not sufficient, a plurality of channels Therefore, it is possible to provide an audio encoder and an encoding processing program that can reduce the variation in encoding quality between channels.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing a first embodiment of an audio encoder according to the present invention.
FIG. 2 is a flowchart showing the entire processing procedure and processing contents of an iterative loop performed by the audio encoder shown in FIG. 1;
FIG. 3 is a flowchart showing a processing procedure and processing contents of a repetitive loop subroutine by the audio encoder shown in FIG. 1;
FIG. 4 is a flowchart showing a processing procedure and processing contents of a repetition loop of quantization / variable length coding / buffer control according to a second embodiment of the audio encoder according to the present invention;
FIG. 5 is a flowchart showing a processing procedure and processing contents of a subroutine of a repetition loop of quantization / variable length coding / buffer control according to a third embodiment of the audio encoder according to the present invention;
FIG. 6 is a flowchart showing a processing procedure and processing contents of a subroutine of a repetition loop of quantization / variable length encoding / buffer control according to a fourth embodiment of the audio encoder according to the present invention;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Hybrid filter bank, 2 ... Psychoacoustic analysis part, 3 ... Repetitive loop, 4 ... Bit stream formation part, 11 ... Subband analysis filter bank, 12 ... Adaptive block length MDCT, 13 ... Bending distortion reduction butterfly, 21 ... High speed Fourier transform unit (FFT), 22 ... Non-predictability measurement unit, 23 ... SMR calculation unit, 24 ... Psychological auditory entropy evaluation unit, 31 ... Non-linear quantization unit, 32 ... Scale factor calculation unit, 33 ... Buffer control unit, 34: Huffman encoding unit, 35: Side information encoding unit.

Claims

In an audio encoder that encodes an audio signal composed of a plurality of channels using bit storage by a variable-length encoding method,
The number of additional bits is obtained based on the number of bits stored in the bit storage and the result of psychoacoustic analysis on the input audio signal, and the obtained additional number of bits is calculated as one frame based on a set bit rate. Bit allocation means for determining the number of bits available for encoding for each frame of the audio signal by adding to the average number of bits per frame;
Detecting means for detecting a power ratio between channels of the input audio signal;
Correction means for correcting the number of additional bits by multiplying the power ratio between channels detected by the detection means by the number of additional bits of the number of available bits determined by the bit allocation means;
Iteratively executing the process of quantizing and variable-length coding the input audio signal based on a scale factor, within a range of the corrected usable number of bits, until a quantization distortion becomes a masking level or less. Loop means;
An audio encoder comprising: means for forming an audio bitstream including a result obtained by repeatedly executing the quantization process and the variable length encoding process.

In an audio encoder that encodes an audio signal composed of a plurality of channels using bit storage by a variable-length encoding method,
Based on the number of bits stored in the bit storage and the result of psychoacoustic analysis on the input audio signal, the number of additional bits is obtained, and the obtained number of additional bits is calculated based on a set bit rate. Bit allocation means for determining the number of bits available for encoding for each frame of the audio signal by adding to the average number of bits per frame;
Detecting means for detecting a ratio of psychoacoustic entropy between each channel of the input audio signal based on a result of the psychoacoustic analysis;
Correction means for correcting the number of additional bits by multiplying the number of additional bits among the number of available bits determined by the bit allocation means by a ratio of psychoacoustic entropy between channels detected by the detection means; ,
Iteratively repeats the process of quantizing and variable-length coding the input audio signal based on a scale factor, within a range of the corrected usable number of bits, until quantization distortion falls below a masking level. Loop means;
An audio encoder comprising: means for forming an audio bitstream including a result obtained by repeatedly executing the quantization process and the variable length encoding process.

The iterative loop means comprises processing means for repeatedly correcting the scale factor to obtain an optimal combination of scale factors that results in a quantization distortion equal to or less than the masking level,
The processing means retains the scale factor when the number of scale factor bands at which the quantization distortion exceeds the masking level is minimum, and the optimum combination of scale factors that causes the quantization distortion to be below the masking level in all scale factor bands 3. The audio encoder according to claim 1, wherein a sub-optimal combination of scale factors is obtained based on the held scale factor when no is found.

The iterative loop means comprises processing means for repeatedly correcting the scale factor to obtain an optimal combination of scale factors that results in a quantization distortion equal to or less than the masking level,
The processing means holds the scale factor when the sum of values weighted according to the bandwidth of the scale factor band in which the quantization distortion exceeds the masking level is minimum, and the quantization distortion is present in all scale factor bands. 3. The suboptimal scale factor combination is obtained based on the held scale factor when an optimal combination of scale factors that falls below the masking level is not found. Audio encoder.

The iterative loop means comprises processing means for repeatedly correcting the scale factor to obtain an optimal combination of scale factors that results in a quantization distortion equal to or less than the masking level,
The processing means holds the scale factor when the sum of the difference between the quantization distortion and the masking level of the scale factor band in which the quantization distortion exceeds the masking level is minimum, and the quantization distortion is present in all the scale factor bands. 3. The suboptimal scale factor combination is obtained based on the held scale factor when an optimal combination of scale factors that falls below the masking level is not found. Audio encoder.

In an encoding processing program used in an audio encoder that uses a computer and encodes an audio signal composed of a plurality of channels using bit storage by a variable-length encoding method,
A processing step of detecting a power ratio between channels of the input audio signal;
Based on the number of bits stored in the bit storage and the result of psychoacoustic analysis on the input audio signal, the number of additional bits is obtained, and the obtained number of additional bits is calculated based on a set bit rate. Processing steps for determining the number of bits available for encoding for each frame of the audio signal by adding to the average number of bits per frame;
By multiplying the power ratio between the detected channels, the number of additional bits of the number of available bits to the determined, and the process step of correcting the number of the additional bits,
Processing for repeatedly performing quantization and variable length coding on the input audio signal based on a scale factor, within a range of the corrected usable number of bits, until quantization distortion is equal to or lower than a masking level. Steps,
An audio signal encoding processing program that causes the computer to execute a processing step of forming an audio bitstream including a result obtained by repeatedly executing the quantization processing and the variable length encoding processing.

In an encoding processing program used in an audio encoder that uses a computer and encodes an audio signal composed of a plurality of channels using bit storage by a variable-length encoding method,
A processing step of detecting a ratio of psychoacoustic entropy between each channel of the input audio signal based on a result of psychoacoustic analysis;
Based on the number of bits stored in the bit storage and the result of psychoacoustic analysis on the input audio signal, the number of additional bits is obtained, and the obtained number of additional bits is calculated based on a set bit rate. Processing steps for determining the number of bits available for encoding for each frame of the audio signal by adding to the average number of bits per frame;
A processing step of correcting the number of the additional bits by multiplying the ratio of the perceptual entropy between the detected channels, the number of additional bits of the available number of bits the determined,
Processing for repeatedly performing quantization and variable length coding on the input audio signal based on a scale factor, within a range of the corrected usable number of bits, until quantization distortion is equal to or lower than a masking level. Steps,
An audio signal encoding processing program that causes the computer to execute a processing step of forming an audio bitstream including a result obtained by repeatedly executing the quantization processing and the variable length encoding processing.

The repetitive processing step includes the step of repeatedly correcting the scale factor to obtain an optimal combination of scale factors that results in a quantization distortion equal to or less than a masking level,
The step of obtaining the optimum combination of scale factors retains the scale factor when the number of scale factor bands in which the quantization distortion exceeds the masking level is minimum, and the quantization distortion is less than or equal to the masking level in all scale factor bands. 8. The sign of an audio signal according to claim 6 or 7, wherein a sub-optimal scale factor combination is obtained based on the held scale factor when no optimal scale factor combination is found. Program.

The repetitive processing step includes the step of repeatedly correcting the scale factor to obtain an optimal combination of scale factors that results in a quantization distortion equal to or less than a masking level,
The step of obtaining the optimum combination of scale factors holds the scale factors when the sum of values weighted according to the bandwidth of the scale factor band in which the quantization distortion exceeds the masking level is minimized, When an optimum combination of scale factors with quantization distortion below the masking level is not found in the factor band, a sub-optimal combination of scale factors is obtained based on the stored scale factors. 8. The audio signal encoding processing program according to claim 6 or 7.

The repetitive processing step includes the step of repeatedly correcting the scale factor to obtain an optimal combination of scale factors that results in a quantization distortion equal to or less than a masking level,
The step of obtaining the optimum combination of scale factors includes the scale factors when the sum of the differences between the quantization distortion and the masking level of the scale factor band in which the quantization distortion exceeds the masking level is minimum, and for all the scales. When an optimum combination of scale factors with quantization distortion below the masking level is not found in the factor band, a sub-optimal combination of scale factors is obtained based on the stored scale factors. 8. The audio signal encoding processing program according to claim 6 or 7.