JP4597360B2

JP4597360B2 - Speech decoding apparatus and speech decoding method

Info

Publication number: JP4597360B2
Application number: JP2000395644A
Authority: JP
Inventors: 幸司吉田
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2000-12-26
Filing date: 2000-12-26
Publication date: 2010-12-15
Anticipated expiration: 2020-12-26
Also published as: JP2002196795A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声信号を符号化して伝送する移動通信や有線通信システム等の用途に用いられる音声復号装置及び音声復号方法に関し、特に音声復号時の符号化データが損失した際に復号音声品質の劣化を抑えるフレーム補償機能を備えた音声復号装置及び音声復号方法に関する。
【０００２】
【従来の技術】
従来、ディジタル移動通信や有線通信の分野においては、電波や有線回線の有効利用のために音声情報を圧縮し、低いビットレートで符号化する音声符号化装置が用いられている。そして、そのような通信システムにおいて、特にパケット単位で伝送を行うシステムでは、受信側（音声復号側）で受信するパケットが損失したり、時間的に遅れて到着して、音声復号を行うのに必要なデータが得られないこと（フレーム損失）があり、その際に復号音声品質の劣化を抑えるためのフレーム補償が行われる。
【０００３】
そのような従来の技術として、ＩＴＵ−Ｔ勧告Ｇ．７２９（"Coding of speech at 8kbit/s using conjugate-structure algebraic-code-excited linear-prediction(CS-ACELP)"）のＣＳ−ＡＣＥＬＰ符号化方式に記載された誤り補償方法がある。
【０００４】
図１１は、ＣＳ−ＡＣＥＬＰ符号化方式のフレーム補償機能を備えた音声復号装置の構成を示すブロック図である。この図において、音声復号は１０ｍｓのフレーム単位で行われ、そのフレーム単位でフレーム損失の有無が音声復号器に通知されるものとする。
【０００５】
まず、フレーム損失が検出されないフレームにおいては、データ分離部１１０１において、復号に必要な各パラメータに分離される。そして、ラグパラメータ復号部１１０２により復号されたラグパラメータを用いて適応音源符号帳部１１０３により適応音源が、また固定音源符号帳部１１０４により固定音源が生成され、ゲインパラメータ復号部１１０５により復号されたゲインを用いて、乗算部１１０６による乗算および加算部１１０７による加算により駆動音源が生成され、ＬＰＣパラメータ復号部１１０８により復号されたＬＰＣパラメータを用いてＬＰＣ合成フィルタ１１０９およびポストフィルタ１１１０を経由して復号音声が生成される。
【０００６】
一方、フレーム損失が検出されたフレームに対しては、ラグパラメータとして、誤りが検出されなかった前フレームのラグパラメータを用いて適応音源を生成し、また、固定音源符号帳部１１０４に対してランダムな固定音源符号を与えることで固定音源を生成し、ゲインパラメータとして、前フレームの適応音源ゲインおよび固定音源ゲインを減衰させた値を用いて駆動音源を生成し、ＬＰＣパラメータとして、前フレームＬＰＣパラメータを用いてＬＰＣ合成およびポストフィルタ処理を行い復号音声を得る。このように、フレーム損失時にフレーム補償処理を行う。
【０００７】
【発明が解決しようとする課題】
しかしながら、上記従来の音声復号装置においては、次のような問題がある。
すなわち、損失フレームより過去のパラメータを用いてフレーム補償処理を行っているため、損失フレームの次の正常フレームにおける復号において、その復号に必要な過去の状態データ（適応符号帳やＬＳＰパラメータやゲインパラメータのＭＡ予測フィルタ状態など）が、本来の（フレーム損失がない場合の）値との相違が大きくなることがあり、それにより復号音声の品質が劣化する。
【０００８】
また、損失フレームが音声の立ち上がりなど過渡的な区間を含む場合には、その区間の再生が行われないことになり、復号音声の品質劣化が生じる。このように、フレーム補償時の復号音声品質劣化の改善に限界がある。
【０００９】
本発明はかかる点に鑑みてなされたものであり、フレーム損失が検出されたフレームにおいて、より改善された復号音声品質を実現することのできる音声復号装置及び音声符号化・復号装置を提供することを目的とする。
【００１０】
【課題を解決するための手段】
本発明の音声復号装置は、フレーム単位の符号化データを受け取り蓄積する受信バッファと、現フレームの符号化データを正しく受信した場合にその符号化データを用いて通常の音声復号を行う音声復号手段と、現フレームの符号化データが損失により受信できなかった場合に損失フレーム補償を行う損失フレーム補償手段と、前記受信バッファが符号化データを時間的に遅れて受信した場合にその符号化データと現フレームの符号化データとを用いて音声復号を行うと共に前記音声復号手段で復号に必要な状態データを更新する損失フレーム復号手段と、を具備する構成を採る。
【００１１】
この構成によれば、損失フレームの符号化データが時間的に遅れて受信された場合に、フレーム補償で復号音声を出力後、その符号化データを用いて過去の損失フレームに相当する区間の音声復号を行って音声復号のために必要な状態データを更新し、その後現在の正常フレームの音声復号を行うことにより、より本来の値に近い状態データを用いた復号が可能となり、復号音声品質の劣化を抑えることが可能となる。
【００１２】
また、本発明の音声復号装置は、フレーム単位の符号化データを受け取り蓄積する受信バッファと、現フレームの符号化データを正しく受信した場合にその符号化データを用いて通常の音声復号を行う音声復号手段と、現フレームの符号化データが損失により受信できなかった場合に損失フレーム補償を行う損失フレーム補償手段と、前記受信バッファが符号化データを時間的に遅れて受信した場合にその符号化データと現フレームの符号化データを用いて音声復号を行うと共に前記音声復号手段で復号に必要な状態データを更新する損失フレーム復号手段と、前記損失フレーム復号手段が復号した現フレームの復号音声信号と前記音声復号手段が復号した現フレームの復号音声信号とを用いて窓かけ加算を行う窓かけ加算手段と、を具備する構成を採る。
【００１３】
この構成によれば、損失フレームの符号化データが時間的に遅れて受信された場合、フレーム補償で復号音声を出力後、その符号化データを用いて過去の損失フレームに相当する区間の音声復号を行って状態データを更新し、その後、現在の正常フレームの音声復号を行って得られた復号音声信号と、損失フレームのフレーム補償処理により状態データを更新した後に音声復号を行って得られた現在の正常フレームの復号音声信号とを用いて、窓かけ加算を行うことにより、より滑らかな復号を行うことが可能となる。
【００１４】
本発明の音声復号装置は、フレーム単位の音声の特性を表す音声モード情報を少なくとも含む、フレーム単位の符号化データを受け取り蓄積する受信バッファと、現フレームの符号化データを正しく受信した場合にその符号化データを用いて通常の音声復号を行う音声復号手段と、現フレームの符号化データが損失により受信できなかった場合に損失フレーム補償を行う損失フレーム補償手段と、前記受信バッファが符号化データを時間的に遅れて受信した場合にその符号化データと現フレームの符号化データを用いて音声復号を行うと共に前記音声復号手段で復号に必要な状態データを更新する損失フレーム復号手段と、前記損失フレーム復号手段が復号した損失フレームと現フレームの復号音声信号に対して時間軸圧縮する時間軸圧縮手段と、前記音声モード情報に応じて、前記時間軸圧縮手段で得られた復号音声信号又は前記損失フレーム復号手段で得られた現フレームの復号音声信号のいずれかを選択し、選択した復号音声信号と前記音声復号手段が復号した現フレームの復号音声信号とを用いて窓かけ加算を行う窓かけ加算手段と、を具備する構成を採る。
【００１５】
この構成によれば、損失フレームの符号化データが時間的に遅れて受信された場合、フレーム補償で復号音声を出力後、その符号化データおよび現在の正常フレームの符号化データを用いて過去の損失フレームおよび現在の正常フレームの区間の音声復号を行い、符号化側で判定されたモード情報が過渡的区間であることを示す場合には、得られた復号信号を時間軸圧縮して正常フレームの復号音声として用いることにより、過渡区間でのフレーム損失に伴う復号音声品質の劣化を抑えることが可能となる。
【００１６】
本発明の基地局装置は、上記音声復号装置を具備する構成を採る。
【００１７】
本発明の通信端末装置は、上記音声復号装置を具備する構成を採る。
【００１８】
本発明の音声復号方法は、フレーム単位の音声の特性を表す音声モード情報を少なくとも含む、フレーム単位の符号化データを受け取り蓄積する受信バッファリング工程と、現フレームの符号化データを正しく受信した場合にその符号化データを用いて通常の音声復号を行う音声復号工程と、現フレームの符号化データが損失により受信できなかった場合に損失フレーム補償を行う損失フレーム補償工程と、前記受信バッファが符号化データを時間的に遅れて受信した場合にその符号化データと現フレームの符号化データを用いて音声復号を行うと共に前記音声復号工程で復号に必要な状態データを更新する損失フレーム復号工程と、前記損失フレーム復号工程で復号された損失フレームと現フレームの復号音声信号に対して時間軸圧縮する時間軸圧縮工程と、前記音声モード情報に応じて、前記時間軸圧縮工程で得られた復号音声信号又は前記損失フレーム復号工程で得られた現フレームの復号音声信号のいずれかを選択し、選択した復号音声信号と前記音声復号工程で復号された現フレームの復号音声信号とを用いて窓かけ加算を行う窓かけ加算工程と、を具備する。
【００１９】
この方法によれば、損失フレームの符号化データが時間的に遅れて受信された場合、フレーム補償で復号音声を出力後、その符号化データを用いて過去の損失フレームに相当する区間の音声復号を行って音声復号のために必要な状態データを更新し、その後、現在の正常フレームの音声復号を行うことにより、より本来の値に近い状態データを用いた復号が可能となり、復号音声品質の劣化を抑えることが可能となる。
【００２０】
本発明の記録媒体は、音声復号プログラムを格納し、コンピュータにより読み取り可能な記録媒体であって、前記音声復号プログラムは、フレーム単位の音声の特性を表す音声モード情報を少なくとも含む、フレーム単位の符号化データを受け取り蓄積する受信バッファリング手順と、現フレームの符号化データを正しく受信した場合にその符号化データを用いて通常の音声復号を行う音声復号手順と、現フレーム符号化データが損失により受信できなかった場合に損失フレーム補償を行う損失フレーム補償手順と、符号化データを時間的に遅れて受信した場合にその符号化データと現フレーム符号化データを用いて音声復号を行うと共に復号に必要な状態データを更新する損失フレーム復号手順と、前記損失フレーム復号手順で復号された損失フレームと現フレームの復号音声信号に対して時間軸圧縮する時間軸圧縮手順と、前記音声モード情報に応じて、前記時間軸圧縮手順で得られた復号音声信号又は前記損失フレーム復号手順で得られた現フレームの復号音声信号のいずれかを選択し、選択した復号音声信号と前記音声復号手順で復号された現フレームの復号音声信号とを用いて窓かけ加算を行う窓かけ加算手順と、を実行させるためのプログラムを記録した機械読み取り可能なものである。
【００２１】
【発明の実施の形態】
本発明の骨子は、損失フレームの符号化データが時間的に遅れて受信された場合に、フレーム補償で復号音声を出力後、その符号化データを用いて過去の損失フレームに相当する区間の音声復号を行って音声復号のために必要な状態データを更新し、その後現在の正常フレームの音声復号を行うことである。
【００２２】
以下、本発明の実施の形態について、図面を参照して詳細に説明する。
【００２３】
（実施の形態１）
図１は、本発明の実施の形態１に係る音声復号装置の構成を示すブロック図である。
この図において、本実施の形態の音声復号装置は、受信バッファ１０１と、音声復号部１０２と、状態データ保持部１０３と、損失フレーム補償部１０４と、損失フレーム復号部１０５と、切替え部１０６とを備えて構成される。
【００２４】
受信バッファ１０１は、送信側でフレーム単位で符号化されてパケット単位で伝送された符号化データを受信し、受信した符号化データを出力するとともに、受信したパケット単位のデータをフレーム単位の符号化データに分解してフレーム損失が生じたか否かの情報（フレーム損失情報）を出力する。
【００２５】
音声復号部１０２は、現フレーム符号化データを正しく受信した場合にその符号化データを用いて通常の音声復号を行う。状態データ保持部１０３は、音声復号部１０２及び損失フレーム復号部１０５で使用される状態データを保持するものである。この状態データは、音声復号部１０２の具体構成に依存するが、ＣＥＬＰ型の復号器の場合には、適応符号帳やＬＳＰ・ゲインパラメータのＭＡ予測フィルタ状態などがある。
【００２６】
損失フレーム補償部１０４は、現フレーム符号化データが損失により受信できなかった場合に損失フレーム補償を行う。損失フレーム復号部１０５は、受信バッファ１０１が符号化データを時間的に遅れて受信した場合に、その符号化データと現フレーム符号化データを用いて音声復号を行うと共に、音声復号部１０２で復号に必要な状態データを更新する。切替え部１０６は、フレーム損失情報に応じて、音声復号部１０２の出力と、損失フレーム補償部１０４と、損失フレーム復号部１０５との切り替えを行う。
【００２７】
このような構成の音声復号装置においては、現時刻のフレームに対するフレーム損失情報によって異なる動作を行う。
まず、現時点のフレームに対する符号化データが損失や遅延なく正しく受信された場合には、音声復号部１０２にてその符号化データを用いて通常の音声復号を行い復号音声信号として出力する。その際、復号に必要な状態データが更新されて状態データ保持部１０３で保持される。
【００２８】
次に、現時点のフレームに対する符号化データを損失フレーム即ち損失または遅延のため正しく受信できない場合には、損失フレーム補償部１０４にて損失フレーム補償を行い、それにより得られた復号音声信号を出力する。この補償処理は任意であり、従来の方法による過去の符号化パラメータや復号データを用いた補償処理を行ってもよい。
【００２９】
次に、前フレームが損失フレームであり、かつ現時点においてその損失フレームの符号化データが時間的に遅れて受信された場合は、その前フレーム符号化データと現時点の正しく受信された現フレームの符号化データを損失フレーム復号部１０５に入力する。
【００３０】
損失フレーム復号部１０５では、入力された前フレーム及び現フレームの符号化データを順次用いて音声復号を行い（図２の（ａ）および（ｂ））、現フレームの符号化データに対する復号音声信号（図２の（ｂ））を出力する。また復号に必要な状態データが更新されて、状態データ保持部１０３で保持される。損失フレーム復号部１０５における音声復号処理は、音声復号部１０２と基本的には同じ処理でよい。切替え部１０６では、以上のフレーム損失情報に応じた切り替えを行って当該フレームの復号音声信号を出力する。
【００３１】
次に、図３のフローチャートを参照して、本実施の形態に係る音声復号装置の動作を説明する。
【００３２】
まず、ステップ１０１において、送信側でフレーム単位で符号化されてパケット単位で伝送された符号化データを受信して蓄積し、さらに蓄積したパケット単位のデータをフレーム単位の符号化データに分解し、フレーム損失が生じたか否かを示す情報（フレーム損失情報）と共に出力する。
【００３３】
次いで、ステップ１０２において、現時点のフレームに対するフレーム損失情報を判定する。この判定において、現時点のフレームに対する符号化データが損失や遅延なく正しく受信できたと判断すると、ステップ１０３に進み、その符号化データを用いて通常の音声復号を行い、復号音声信号として出力する。これに対して、現時点のフレームに対する符号化データが損失フレーム即ち損失または遅延のため正しく受信できないと判断すると、ステップ１０４に進み、損失フレーム補償を行い、それにより得られた復号音声信号を出力する。
【００３４】
また、前フレームが損失フレームであり、かつ現時刻においてその損失フレームの符号化データが時間的に遅れて受信したと判断すると、ステップ１０５に進み、前フレームの符号化データと現時点の正しく受信された現フレームの符号化データを順次用いて音声復号を行い、現フレーム符号化データに対する復号音声信号を出力する。この復号音声信号を出力した後は、ステップ１０６で復号に必要な状態データを更新する。
ステップ１０３、ステップ１０４又はステップ１０６の処理を行った後、ステップ１０７で復号音声信号を出力信号として出力する。
【００３５】
このように、本実施の形態に係る音声復号装置によれば、損失フレームの符号化データが時間的に遅れて受信された場合には、フレーム補償で復号音声を出力した後、その符号化データを用いて過去の損失フレームに相当する区間の音声復号を行って音声復号のために必要な状態データを更新し、その後現在の正常フレームの音声復号を行うようにしたので、より本来の値に近い状態データを用いた復号が可能となり、復号音声品質の劣化を抑えることが可能となる。
【００３６】
（実施の形態２）
図４は、本発明の実施の形態２に係る音声復号装置の構成を示すブロック図である。なお、この図において前述した図１と共通する部分には同一の符号を付けている。
【００３７】
この図において、本実施の形態の音声復号装置は、実施の形態１の音声復号装置と同一の構成に加えて、窓かけ加算部１０７を備えている。
受信バッファ１０１において、送信側でフレーム単位で符号化され、パケット単位で伝送された符号化データを受信する。そして、パケット単位のデータをフレーム単位の符号化データに分解して、フレーム損失が生じたか否かの情報（フレーム損失情報）と共に出力する。そして、以降の動作は現時刻のフレームに対するフレーム損失情報によって異なる。
【００３８】
まず、現時点のフレームに対する符号化データが損失や遅延なく正しく受信された場合には、音声復号部１０２において、その符号化データを用いて通常の音声復号を行い、復号音声信号として出力する。その際、復号に必要な状態データが更新されて状態データ保持部１０３で保持される。
【００３９】
次に、現時点のフレームに対する符号化データが損失フレーム即ち損失または遅延のため正しく受信できない場合には、損失フレーム補償部１０４において損失フレーム補償を行い、それにより得られた復号音声信号を出力する。なお、この補償処理は任意であり、従来の方法による過去の符号化パラメータや復号データを用いた補償処理を行ってもよい。
【００４０】
次に、前フレームが損失フレームであり、かつ現時刻においてその損失フレームの符号化データが時間的に遅れて受信された場合には、その前フレーム符号化データと現時刻の正しく受信された現フレーム符号化データを損失フレーム復号部１０５に入力する。
【００４１】
損失フレーム復号部１０５では、入力された前フレーム及び現フレーム符号化データを順次用いて音声復号を行い（図５の（ａ）および（ｂ））、現フレーム符号化データに対する復号音声信号（図５の（ｂ））を出力する。また、復号に必要な状態データが更新されて状態データ保持部１０３で保持される。損失フレーム復号部１０５における音声復号処理は音声復号部１０２と基本的には同じ処理でよい。また、現時点の正しく受信された現フレーム符号化データを用いて、音声復号部１０２において音声復号を行い、復号音声信号（図５の（ｃ））を出力する。
【００４２】
窓かけ加算部１０７では、損失フレーム復号部１０５からの出力信号（図５の（ｂ））および音声復号部１０２からの出力信号（図５の（ｃ））を用いて、当該フレームの開始端では音声復号部１０２の出力が、終了端では損失フレーム復号部１０５の出力が支配的になるような窓かけ加算を行い、復号音声信号として出力する。そして、切替え部１０６において、以上のフレーム損失情報に応じた切り替えが行われて、当該フレームの復号音声信号が出力される。
【００４３】
次に、図６のフローチャートを参照して、本実施の形態に係る音声復号装置の動作を説明する。
【００４４】
まず、ステップ２０１において、送信側でフレーム単位で符号化されパケット単位で伝送された符号化データを受信し、さらに受信したパケット単位のデータをフレーム単位の符号化データに分解して、フレーム損失が生じたか否かを示す情報（フレーム損失情報）と共に出力する。次いで、ステップ２０２において、現時刻のフレームに対するフレーム損失情報を判定する。
【００４５】
この判定において、現時点のフレームに対する符号化データを損失や遅延なく正しく受信できたと判断すると、ステップ２０３に進み、その符号化データを用いて通常の音声復号を行い、復号音声信号として出力する。これに対して、現時点のフレームに対する符号化データが損失フレーム即ち損失または遅延のため正しく受信できないと判断すると、ステップ２０４に進み、損失フレーム補償を行い、それにより得られた復号音声信号を出力する。
【００４６】
また、前フレームが損失フレームであり、かつ現時刻においてその損失フレームの符号化データを時間的に遅れて受信したと判断すると、ステップ２０５に進み、前フレームの符号化データと現時点の正しく受信された現フレームの符号化データを順次用いて音声復号を行い、現フレームの符号化データに対する復号音声信号を出力する。そして、ステップ２０６で復号に必要な状態データを更新する。
【００４７】
データの更新を行った後、ステップ２０７において、現時点の正しく受信された現フレームの符号化データを用いて通常の音声復号を行い、復号音声信号を出力する。次いで、ステップ２０８において、ステップ２０４における出力信号およびステップ２０５における出力信号を用いて窓かけ加算を行い、復号音声信号として出力する。
ステップ２０３、ステップ２０４又はステップ２０８の処理を行った後、ステップ２０９で復号音声信号を出力信号として出力する。
【００４８】
このように、本実施の形態に係る音声復号装置によれば、損失フレームの符号化データが時間的に遅れて受信された場合には、フレーム補償で復号音声を出力した後、そのときの符号化データを用いて過去の損失フレームに相当する区間の音声復号を行って状態データを更新し、その後、現在の正常フレームの音声復号を行って得られた復号音声信号と損失フレームのフレーム補償処理により状態データを更新した後に音声復号を行って得られた現在の正常フレームの復号音声信号とを用いて、窓かけ加算を行うようにしたので、より滑らかな復号を行うことが可能となる。
【００４９】
（実施の形態３）
図７は、本発明の実施の形態３に係る音声符号化・復号装置の構成を示すブロック図である。なお、この図において前述した図１又は図４と共通する部分には同一の符号を付けている。
【００５０】
この図において、本実施の形態の音声符号化・復号装置の符号化側は、音声モード判定部１１０と、音声符号化部１１１と、多重化部１１２とを備えて構成され、復号側は、受信バッファ１０１と、分離部１１３と、音声復号部１０２と、状態データ保持部１０３と、損失フレーム補償部１０４と、損失フレーム復号部１０５と、時間軸圧縮部１１４と、切替え部１１５と、切替え部１０６と、窓かけ加算部１０７とを備えて構成される。
【００５１】
音声符号化部１１１では、フレーム単位の入力音声に対して音声符号化を行い、符号化データを出力する。音声モード判定部１１０では、フレーム単位の入力音声の音声モードを判定する。ここで、音声モードとは、フレーム単位の入力音声が定常的区間か立ち上がり等の過渡的な区間かという特性を表すものである。
多重化部１１２は、音声符号化部１１１から出力される符号化データと音声モード判定部１１０で得られた音声モードを多重化して音声モード情報込みの符号化データとして出力する。
【００５２】
受信バッファ１０１では、送信側でフレーム単位で符号化され、パケット単位で伝送された符号化データを受信する。さらに、受信したパケット単位のデータをフレーム単位の符号化データに分解して、フレーム損失が生じたか否かの情報（フレーム損失情報）と共に出力する。分離部１１３では、音声モード情報込みの符号化データから音声モード情報と音声符号化データに分離する。そして、以降の動作は、現時刻のフレームに対するフレーム損失情報によって異なる動作となる。
【００５３】
現時点のフレームに対する符号化データが、損失や遅延なく正しく受信された場合には、音声復号部１０２において、その符号化データを用いて通常の音声復号を行い、復号音声信号として出力する。その際、復号に必要な状態データが更新されて状態データ保持部１０３で保持される。
【００５４】
ここで、状態データとは音声復号部１０２の具体構成に依存するが、ＣＥＬＰ型の復号器の場合には、適応符号帳やＬＳＰ・ゲインパラメータのＭＡ予測フィルタ状態などがある。
【００５５】
一方、現時点のフレームに対する符号化データが、損失フレーム即ち損失または遅延のため正しく受信できない場合には、損失フレーム補償部１０４において損失フレーム補償を行い、それにより得られた復号音声信号を出力する。この補償処理は任意であり、従来の方法による過去の符号化パラメータや復号データを用いた補償処理を行ってもよい。
【００５６】
また、前フレームが損失フレームであり、かつ現時点においてその損失フレームの符号化データが時間的に遅れて受信された場合は、その前フレーム符号化データと現時刻の正しく受信された現フレーム符号化データを損失フレーム復号部１０５に入力する。
【００５７】
損失フレーム復号部１０５では、入力された前フレーム及び現フレームの符号化データを順次用いて音声復号を行い（図８の（ａ）および（ｂ））、現フレーム符号化データに対する復号音声信号（図８の（ｂ））を切替え部１１５に、前フレーム及び現フレームの符号化データに対する復号音声信号（図８の（ａ）と（ｂ））を時間軸圧縮部１１４に出力する。また、復号に必要な状態データが更新され状態データ保持部１０３で保持される。この損失フレーム復号部１０５における音声復号処理は、音声復号部１０２と基本的には同じ処理で良い。
【００５８】
時間軸圧縮部１１４では、前フレーム及び現フレームの符号化データに対する復号音声信号（図８の（ａ）と（ｂ））を１フレームの区間の信号となるように時間軸圧縮を行い、圧縮後の信号（図８の（ｄ））を切替え部１１５に出力する。また、現時点の正しく受信された現フレームの符号化データを用いて、音声復号部１０２において音声復号を行い復号音声信号（図８の（ｃ））を出力する。そして、切替え部１１５において、分離部１１３から得られた損失フレームである前フレームの音声モード情報が過渡的モードを示す場合には、時間軸圧縮部１１４の出力を、そうでない場合即ち定常モード場合には、損失フレーム復号部１０５の出力を窓かけ加算部１０７に出力する。
【００５９】
窓かけ加算部１０７では、切替え部１１５を通過した出力信号（図８の（ｂ）または（ｄ））および音声復号部１０２における出力信号（図８の（ｃ））を用いて、当該フレームの開始端では音声復号部１０２の出力が、終了端では切替え部１１５を通過した出力信号が支配的になるような窓かけ加算を行い、復号音声信号として出力する。そして、切替え部１０６において、以上のフレーム損失情報に応じた切り替えが行われ当該フレームの復号音声信号が出力される。
【００６０】
次に、本実施の形態に係る音声復号装置の動作を説明する。
図９は、符号化側の動作を示すフローチャートである。
ステップ３０１において、フレーム単位の入力音声に対して音声符号化を行い符号化データを出力する。次いで、ステップ３０２においてフレーム単位の入力音声の音声モードを判定する。そして、ステップ３０３においてステップ３０１で得られる符号化データとステップ３０２で得られた音声モードを多重化して音声モード情報込みの符号化データとして出力する。
【００６１】
図１０は、復号側の動作を示すフローチャートである。
まず。ステップ３０４において、送信側でフレーム単位で符号化され、パケット単位で伝送された符号化データを受信する。さらに受信したパケット単位のデータをフレーム単位の符号化データに分解して、フレーム損失が生じたか否かの情報（フレーム損失情報）と共に出力する。次いで、ステップ３０５において音声モード情報込みの符号化データから音声モード情報と音声符号化データに分離する。
【００６２】
次いで、ステップ３０６において、現時点のフレームに対するフレーム損失情報を判定する。まず、現時刻のフレームに対する符号化データが損失や遅延なく正しく受信できたと判断すると、ステップ３０７に進み、その符号化データを用いて通常の音声復号を行い、復号音声信号として出力する。
【００６３】
一方、現時点のフレームに対する符号化データが損失フレーム即ち損失または遅延のため正しく受信できないと判断すると、ステップ３０８に進み、損失フレーム補償を行い、それにより得られた復号音声信号を出力する。
【００６４】
また、前フレームが損失フレームであり、かつ現時刻においてその損失フレームの符号化データが時間的に遅れて受信したと判断すると、ステップ３０９に進み、前フレームの符号化データと現時点の正しく受信された現フレームの符号化データを順次用いて音声復号を行い、前フレーム及び現フレーム符号化データに対する復号音声信号を出力する。次いで、ステップ３１０において、復号に必要な状態データを更新する。
【００６５】
次いで、ステップ３１１において、ステップ３０９で得られた前フレーム及び現フレームの符号化データに対する復号音声信号を１フレームの区間の信号となるように時間軸圧縮を行い、圧縮後の信号を出力する。次いで、ステップ３１２において、現時点の正しく受信された現フレームの符号化データを用いて通常の音声復号を行い復号音声信号を出力する。
【００６６】
次いで、ステップ３１３において、ステップ３０５で得られた、損失フレームである前フレームの音声モード情報を判定する。この判定において、音声モード情報が過渡的モードであると判断すると、ステップ３１４に進み、ステップ３１１の処理の出力とステップ３１２の処理の出力を用いた窓かけ加算を行い、復号音声信号として出力する。
一方、音声モード情報が過渡的モードでないと判断すると、ステップ３１５に進み、ステップ３０９の処理で得られた現フレームの符号化データに対する復号音声信号とステップ３１２の処理の出力を用いた窓かけ加算を行い、復号音声信号として出力する。
ステップ３０７、ステップ３０８、ステップ３１４又はステップ３１５の処理を行った後、ステップ３１６で復号音声信号を出力信号として出力する。
【００６７】
このように、本実施の形態に係る音声復号装置によれば、損失フレームの符号化データが時間的に遅れて受信した場合に、フレーム補償で復号音声を出力後、その符号化データおよび現在の正常フレームの符号化データを用いて、過去の損失フレームおよび現在の正常フレームの区間の音声復号を行い、符号化側で判定されたモード情報が過渡的区間であることを示す場合には、得られた復号信号を時間軸圧縮して正常フレームの復号音声として用いるので、過渡区間でのフレーム損失に伴う復号音声品質の劣化を抑えることができる。
【００６８】
【発明の効果】
以上説明したように、本発明の音声復号装置によれば、損失フレームの符号化データが時間的に遅れて受信された場合に、フレーム補償で復号音声を出力後、その符号化データを用いて、過去の損失フレームに相当する区間の音声復号を行って音声復号のために必要な状態データを更新した後に、現在の正常フレームの音声復号を行うことにより、より本来の値に近い状態データを用いた復号が可能となり、復号音声品質の劣化を抑えることができる。
【００６９】
また、本発明の音声復号装置によれば、損失フレームの符号化データが時間的に遅れて受信された場合に、フレーム補償で復号音声を出力後、その符号化データを用いて、過去の損失フレームに相当する区間の音声復号を行って状態データを更新した後に、現在の正常フレームの音声復号を行って得られた復号音声信号と、損失フレームのフレーム補償処理により状態データを更新した後に音声復号を行って得られた現在の正常フレームの復号音声信号とを用いて、窓かけ加算を行うことにより、より滑らかな復号を行うことができる。
【００７０】
また、本発明の音声復号装置によれば、損失フレームの符号化データが時間的に遅れて受信された場合に、フレーム補償で復号音声を出力後、その符号化データおよび現在の正常フレームの符号化データを用いて、過去の損失フレームおよび現在の正常フレームの区間の音声復号を行い、符号化側で判定されたモード情報が過渡的区間であることを示す場合には、得られた復号信号を時間軸圧縮して正常フレームの復号音声として用いることにより、過渡区間でのフレーム損失に伴う復号音声品質の劣化を抑えることができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１に係る音声復号装置の構成を示すブロック図
【図２】実施の形態１に係る音声復号装置において損失フレームの符号化データが遅延して受信された場合の処理を説明する図
【図３】実施の形態１に係る音声復号装置の動作を示すフローチャート
【図４】本発明の実施の形態２に係る音声復号装置の構成を示すブロック図
【図５】実施の形態２に係る音声復号装置において損失フレームの符号化データが遅延して受信された場合の処理を説明する図
【図６】実施の形態２に係る音声復号装置の動作を示すフローチャート
【図７】本発明の実施の形態３に係る音声符号化・復号装置の構成を示すブロック図
【図８】実施の形態３に係る音声符号化・復号装置における損失フレームの符号化データが遅延して受信された場合の処理を説明する図
【図９】実施の形態３に係る音声符号化・復号装置の動作を示すフローチャート
【図１０】実施の形態３に係る音声符号化・復号装置の動作を示すフローチャート
【図１１】従来の音声復号装置の構成を示すブロック図
【符号の説明】
１０１受信バッファ
１０２音声復号部
１０３状態データ保持部
１０４損失フレーム補償部
１０５損失フレーム復号部
１０６、１１５切替え部
１０７窓かけ加算部
１１０音声モード判定部
１１１音声符号化部
１１２多重化部
１１３分離部
１１４時間軸圧縮部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech decoding apparatus used for applications such as mobile communication and wired communication systems that encode and transmit speech signals. And speech decoding method In particular, a speech decoding apparatus having a frame compensation function for suppressing degradation of decoded speech quality when encoded data at the time of speech decoding is lost, and Speech decoding method About.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, in the field of digital mobile communication and wired communication, a speech encoding device that compresses speech information and encodes at a low bit rate is used for effective use of radio waves and wired lines. In such a communication system, particularly in a system that performs transmission in units of packets, a packet received on the receiving side (voice decoding side) is lost or arrives with a delay in time to perform voice decoding. Necessary data cannot be obtained (frame loss), and at that time, frame compensation is performed to suppress degradation of decoded speech quality.
[0003]
As such conventional technology, ITU-T recommendation G.I. 729 (“Coding of speech at 8 kbit / s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP)”) is an error compensation method described in the CS-ACELP coding method.
[0004]
FIG. 11 is a block diagram showing a configuration of a speech decoding apparatus having a CS-ACELP encoding scheme frame compensation function. In this figure, speech decoding is performed in units of 10 ms frames, and the speech decoder is notified of the presence or absence of frame loss in units of frames.
[0005]
First, a frame in which no frame loss is detected is separated into parameters necessary for decoding by the data separation unit 1101. Then, using the lag parameter decoded by the lag parameter decoding unit 1102, an adaptive excitation codebook unit 1103 generates an adaptive excitation, and a fixed excitation codebook unit 1104 generates a fixed excitation, and the gain parameter decoding unit 1105 decodes it. Using the gain, a driving sound source is generated by multiplication by the multiplication unit 1106 and addition by the addition unit 1107, and decoding is performed via the LPC synthesis filter 1109 and the post filter 1110 using the LPC parameters decoded by the LPC parameter decoding unit 1108. Audio is generated.
[0006]
On the other hand, for a frame in which frame loss is detected, an adaptive excitation is generated using the lag parameter of the previous frame in which no error has been detected as a lag parameter, and random generation is performed with respect to fixed excitation codebook unit 1104. A fixed excitation is generated by giving a fixed excitation code, a driving excitation is generated using a value obtained by attenuating the adaptive excitation gain and the fixed excitation gain of the previous frame as gain parameters, and the previous frame LPC parameter is used as an LPC parameter. Is used to perform LPC synthesis and post-filter processing to obtain decoded speech. In this way, frame compensation processing is performed when a frame is lost.
[0007]
[Problems to be solved by the invention]
However, the conventional speech decoding apparatus has the following problems.
That is, since frame compensation processing is performed using parameters past from the lost frame, past state data (adaptive codebook, LSP parameter, gain parameter, etc.) necessary for the decoding in decoding in the normal frame next to the lost frame is used. The MA prediction filter state) may be significantly different from the original value (when there is no frame loss), thereby degrading the quality of the decoded speech.
[0008]
In addition, when the lost frame includes a transitional period such as the rise of the voice, the period is not reproduced, and the quality of the decoded voice is deteriorated. As described above, there is a limit to the improvement in the degradation of the decoded speech quality at the time of frame compensation.
[0009]
The present invention has been made in view of the above points, and provides a speech decoding apparatus and a speech encoding / decoding apparatus that can realize improved decoded speech quality in a frame in which frame loss is detected. With the goal.
[0010]
[Means for Solving the Problems]
The speech decoding apparatus of the present invention includes a reception buffer for receiving and storing encoded data in units of frames, and a speech decoding means for performing normal speech decoding using the encoded data when the encoded data of the current frame is correctly received Loss frame compensation means for performing loss frame compensation when the encoded data of the current frame cannot be received due to loss, and encoded data when the reception buffer receives the encoded data with a time delay, It employs a configuration comprising loss frame decoding means for performing voice decoding using encoded data of the current frame and updating state data necessary for decoding by the voice decoding means.
[0011]
According to this configuration, when the encoded data of the lost frame is received with a time delay, the decoded audio is output by frame compensation, and then the audio of the section corresponding to the past lost frame is used using the encoded data. By performing decoding and updating the state data necessary for speech decoding, and then performing speech decoding of the current normal frame, decoding using state data closer to the original value becomes possible, and the decoded speech quality is improved. Deterioration can be suppressed.
[0012]
The speech decoding apparatus of the present invention also includes a reception buffer that receives and stores encoded data in units of frames, and a speech that performs normal speech decoding using the encoded data when the encoded data of the current frame is correctly received. Decoding means, loss frame compensation means for performing lost frame compensation when encoded data of the current frame cannot be received due to loss, and encoding when the reception buffer receives encoded data with a time delay Loss frame decoding means for performing voice decoding using data and encoded data of the current frame and updating state data necessary for decoding by the voice decoding means, and a decoded voice signal of the current frame decoded by the loss frame decoding means And windowed addition means for performing windowed addition using the decoded speech signal of the current frame decoded by the voice decoding means. A configuration.
[0013]
According to this configuration, when the encoded data of the lost frame is received with a time delay, the decoded speech is output by frame compensation, and then the encoded speech is used to decode the section corresponding to the past lost frame. To update the state data, and then obtain the decoded speech signal obtained by performing speech decoding of the current normal frame and the speech decoding after updating the state data by frame compensation processing of the lost frame By performing windowed addition using the decoded audio signal of the current normal frame, smoother decoding can be performed.
[0014]
Of the present invention Speech decoder Is Including at least audio mode information representing audio characteristics in units of frames; A reception buffer for receiving and storing encoded data in units of frames; audio decoding means for performing normal audio decoding using the encoded data when the encoded data of the current frame is correctly received; and encoded data of the current frame Loss frame compensation means for performing loss frame compensation when the data cannot be received due to loss, and when the reception buffer receives the encoded data with a time delay, the encoded data and the encoded data of the current frame are used. The lost frame decoding means for performing voice decoding and updating state data necessary for decoding by the voice decoding means, and the time frame compression for the decoded voice signal of the lost frame and the current frame decoded by the lost frame decoding means Time axis compression means and the voice mode information In response to the , Select either the decoded speech signal obtained by the time axis compression means or the decoded speech signal of the current frame obtained by the lost frame decoding means, and the selected decoded speech signal and the current frame decoded by the speech decoding means And a windowed adding means for performing windowed addition using the decoded speech signal.
[0015]
According to this configuration, when the encoded data of the lost frame is received with a time delay, the decoded speech is output by frame compensation, and then the past data is used using the encoded data and the encoded data of the current normal frame. When speech decoding is performed for the period of the lost frame and the current normal frame, and the mode information determined on the encoding side indicates that it is a transitional period, the obtained decoded signal is time-axis compressed to obtain a normal frame By using this as the decoded speech, it is possible to suppress the degradation of the decoded speech quality due to the frame loss in the transition period.
[0016]
The base station apparatus of the present invention is the above Speech decoder The structure which comprises is taken.
[0017]
The communication terminal device of the present invention is the above Speech decoder The structure which comprises is taken.
[0018]
The speech decoding method of the present invention includes: Including at least audio mode information representing audio characteristics in units of frames; Receives and stores encoded data in frame units Buffering A speech decoding step for performing normal speech decoding using the encoded data when the encoded data of the current frame is correctly received, and a lost frame when the encoded data of the current frame cannot be received due to loss. A lost frame compensation step of performing compensation, and when the reception buffer receives encoded data with a time delay, speech decoding is performed using the encoded data and the encoded data of the current frame, and the speech decoding step A lost frame decoding step for updating state data necessary for decoding; Time axis compression step for time axis compression of the lost frame decoded in the lost frame decoding step and the decoded audio signal of the current frame, and the decoded speech obtained in the time axis compression step according to the audio mode information A signal or a decoded speech signal of the current frame obtained in the lost frame decoding step, and windowing using the selected decoded speech signal and the decoded speech signal of the current frame decoded in the speech decoding step A windowed addition process for adding, It comprises.
[0019]
According to this method, when encoded data of a lost frame is received with a time delay, the decoded speech is output by frame compensation, and then the speech decoding of a section corresponding to a past lost frame is performed using the encoded data. To update the state data necessary for speech decoding, and then perform speech decoding of the current normal frame, thereby enabling decoding using state data closer to the original value and Deterioration can be suppressed.
[0020]
The recording medium of the present invention is a recording medium that stores a speech decoding program and is readable by a computer. Including at least audio mode information representing audio characteristics in units of frames; Receive and store encoded data in units of frames Receive buffering When the encoded data of the current frame is correctly received, normal audio decoding is performed using the encoded data. Speech decoding Procedure and loss frame compensation when current frame coded data could not be received due to loss Lost frame compensation Procedure, and when encoded data is received with a time delay, audio decoding is performed using the encoded data and current frame encoded data, and state data necessary for decoding is updated. Lost frame decoding Procedure and A time-axis compression procedure for time-axis compressing a lost frame decoded by the lost frame decoding procedure and a decoded speech signal of the current frame, and a decoded speech obtained by the time-axis compression procedure according to the speech mode information A signal or a decoded speech signal of the current frame obtained by the lost frame decoding procedure, and windowing using the selected decoded speech signal and the decoded speech signal of the current frame decoded by the speech decoding procedure A windowed addition procedure for adding, Is a machine-readable program that records a program for executing the program.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
The essence of the present invention is that, when encoded data of a lost frame is received with a time delay, the decoded audio is output by frame compensation, and then the audio of the section corresponding to the past lost frame is used using the encoded data. Decoding is performed to update state data necessary for speech decoding, and then speech decoding of the current normal frame is performed.
[0022]
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0023]
(Embodiment 1)
FIG. 1 is a block diagram showing the configuration of the speech decoding apparatus according to Embodiment 1 of the present invention.
In this figure, the speech decoding apparatus according to the present embodiment includes a reception buffer 101, a speech decoding unit 102, a state data holding unit 103, a lost frame compensation unit 104, a lost frame decoding unit 105, and a switching unit 106. It is configured with.
[0024]
The reception buffer 101 receives encoded data encoded in units of frames on the transmission side and transmitted in units of packets, outputs the received encoded data, and encodes the received data in units of packets in units of frames. Information (frame loss information) indicating whether or not frame loss has occurred after being decomposed into data is output.
[0025]
When the current frame encoded data is correctly received, the audio decoding unit 102 performs normal audio decoding using the encoded data. The state data holding unit 103 holds state data used by the speech decoding unit 102 and the lost frame decoding unit 105. This state data depends on the specific configuration of the speech decoding unit 102, but in the case of a CELP decoder, there are an adaptive codebook, an LSP / gain parameter MA prediction filter state, and the like.
[0026]
The lost frame compensation unit 104 performs lost frame compensation when the current frame encoded data cannot be received due to loss. When the reception buffer 101 receives the encoded data with a time delay, the lost frame decoding unit 105 performs audio decoding using the encoded data and the current frame encoded data, and the audio decoding unit 102 performs decoding. Update the state data required for. The switching unit 106 switches between the output of the speech decoding unit 102, the lost frame compensation unit 104, and the lost frame decoding unit 105 according to the frame loss information.
[0027]
In the speech decoding apparatus having such a configuration, different operations are performed depending on the frame loss information for the frame at the current time.
First, when the encoded data for the current frame is correctly received without loss or delay, the audio decoding unit 102 performs normal audio decoding using the encoded data and outputs the decoded audio signal. At this time, the state data necessary for decoding is updated and held in the state data holding unit 103.
[0028]
Next, if the encoded data for the current frame cannot be correctly received due to a lost frame, that is, loss or delay, the lost frame compensation unit 104 performs loss frame compensation and outputs the decoded speech signal obtained thereby. . This compensation processing is arbitrary, and compensation processing using past coding parameters and decoded data by a conventional method may be performed.
[0029]
Next, if the previous frame is a lost frame and the encoded data of the lost frame is received with a delay in time, the encoded code of the previous frame and the current frame received correctly The converted data is input to the lost frame decoding unit 105.
[0030]
Lost frame decoding section 105 performs speech decoding by sequentially using the input encoded data of the previous frame and current frame ((a) and (b) in FIG. 2), and a decoded audio signal for the encoded data of the current frame ((B) of FIG. 2) is output. Further, the state data necessary for decoding is updated and held in the state data holding unit 103. The speech decoding process in the lost frame decoding unit 105 may be basically the same process as the speech decoding unit 102. The switching unit 106 performs switching according to the above frame loss information and outputs a decoded audio signal of the frame.
[0031]
Next, the operation of the speech decoding apparatus according to this embodiment will be described with reference to the flowchart of FIG.
[0032]
First, in step 101, encoded data transmitted in units of packets after being encoded in units of frames on the transmission side is received and stored, and the stored data in units of packets is further decomposed into encoded data in units of frames, Output together with information (frame loss information) indicating whether or not frame loss has occurred.
[0033]
Next, in step 102, frame loss information for the current frame is determined. In this determination, if it is determined that the encoded data for the current frame has been correctly received without loss or delay, the process proceeds to step 103, where normal audio decoding is performed using the encoded data, and the decoded audio signal is output. On the other hand, if it is determined that the encoded data for the current frame cannot be received correctly due to a lost frame, that is, loss or delay, the process proceeds to step 104 to perform lost frame compensation and output the decoded speech signal obtained thereby. .
[0034]
If it is determined that the previous frame is a lost frame and the encoded data of the lost frame has been received with a time delay at the current time, the process proceeds to step 105, where the encoded data of the previous frame and the current data are correctly received. The encoded data of the current frame is sequentially used for audio decoding, and a decoded audio signal for the current frame encoded data is output. After outputting the decoded audio signal, the state data necessary for decoding is updated in step 106.
After performing the processing of step 103, step 104, or step 106, in step 107, the decoded audio signal is output as an output signal.
[0035]
As described above, according to the speech decoding apparatus according to the present embodiment, when encoded data of a lost frame is received with a time delay, the encoded data is output after the decoded speech is output by frame compensation. Is used to update the state data necessary for speech decoding by performing speech decoding of the section corresponding to the past lost frame, and then performing speech decoding of the current normal frame. Decoding using near state data is possible, and degradation of decoded speech quality can be suppressed.
[0036]
(Embodiment 2)
FIG. 4 is a block diagram showing the configuration of the speech decoding apparatus according to Embodiment 2 of the present invention. In this figure, the same reference numerals are given to the portions common to FIG. 1 described above.
[0037]
In this figure, the speech decoding apparatus according to the present embodiment includes a windowing addition unit 107 in addition to the same configuration as the speech decoding apparatus according to the first embodiment.
The reception buffer 101 receives encoded data that has been encoded in units of frames on the transmission side and transmitted in units of packets. Then, the packet unit data is decomposed into frame unit encoded data and output together with information on whether or not frame loss has occurred (frame loss information). The subsequent operations differ depending on the frame loss information for the current time frame.
[0038]
First, when the encoded data for the current frame is correctly received without loss or delay, the audio decoding unit 102 performs normal audio decoding using the encoded data, and outputs the decoded audio signal. At this time, state data necessary for decoding is updated and held in the state data holding unit 103.
[0039]
Next, when the encoded data for the current frame cannot be received correctly due to a lost frame, that is, loss or delay, the lost frame compensation unit 104 performs lost frame compensation, and outputs the decoded speech signal obtained thereby. Note that this compensation processing is arbitrary, and compensation processing using past coding parameters and decoded data by a conventional method may be performed.
[0040]
Next, when the previous frame is a lost frame and the encoded data of the lost frame is received with a time delay at the current time, the previous frame encoded data and the current received correctly received The frame encoded data is input to the lost frame decoding unit 105.
[0041]
Lost frame decoding section 105 performs speech decoding by sequentially using the input previous frame and current frame encoded data ((a) and (b) of FIG. 5), and a decoded speech signal (FIG. 5) for the current frame encoded data. 5 (b)) is output. In addition, state data necessary for decoding is updated and held in the state data holding unit 103. The speech decoding process in the lost frame decoding unit 105 may be basically the same process as the speech decoding unit 102. Also, using the current frame encoded data correctly received at the present time, speech decoding section 102 performs speech decoding and outputs a decoded speech signal ((c) in FIG. 5).
[0042]
The window addition unit 107 uses the output signal from the loss frame decoding unit 105 (FIG. 5B) and the output signal from the speech decoding unit 102 (FIG. 5C) to start the frame. Then, windowed addition is performed so that the output of the speech decoding unit 102 is dominant and the output of the lost frame decoding unit 105 is dominant at the end, and the result is output as a decoded speech signal. Then, the switching unit 106 performs switching according to the above frame loss information, and outputs a decoded audio signal of the frame.
[0043]
Next, the operation of the speech decoding apparatus according to the present embodiment will be described with reference to the flowchart of FIG.
[0044]
First, in step 201, encoded data that is encoded on a frame-by-frame basis and transmitted on a packet-by-packet basis is received on the transmitting side, and further, the received packet-unit data is decomposed into encoded data on a frame-by-frame basis. It is output together with information (frame loss information) indicating whether or not it has occurred. Next, in step 202, frame loss information for the current time frame is determined.
[0045]
In this determination, if it is determined that the encoded data for the current frame has been correctly received without loss or delay, the process proceeds to step 203, where normal audio decoding is performed using the encoded data and output as a decoded audio signal. On the other hand, if it is determined that the encoded data for the current frame cannot be received correctly due to a lost frame, that is, loss or delay, the process proceeds to step 204 to perform lost frame compensation and output the decoded speech signal obtained thereby. .
[0046]
If it is determined that the previous frame is a lost frame and the encoded data of the lost frame has been received with a delay in time at the current time, the process proceeds to step 205, where the encoded data of the previous frame and the current data are correctly received. Then, speech decoding is performed sequentially using the encoded data of the current frame, and a decoded audio signal for the encoded data of the current frame is output. In step 206, state data necessary for decoding is updated.
[0047]
After updating the data, in step 207, normal speech decoding is performed using the encoded data of the current frame correctly received at the present time, and a decoded speech signal is output. Next, in step 208, windowed addition is performed using the output signal in step 204 and the output signal in step 205, and the result is output as a decoded speech signal.
After performing the processing of step 203, step 204 or step 208, the decoded audio signal is output as an output signal in step 209.
[0048]
Thus, according to the speech decoding apparatus according to the present embodiment, when encoded data of a lost frame is received with a time delay, the decoded speech is output by frame compensation, and then the code at that time is output. The state data is updated by performing speech decoding of the section corresponding to the past lost frame using the digitized data, and then the frame compensation processing of the decoded speech signal and the lost frame obtained by performing speech decoding of the current normal frame Since the windowed addition is performed using the decoded audio signal of the current normal frame obtained by performing the audio decoding after the state data is updated by the above, smoother decoding can be performed.
[0049]
(Embodiment 3)
FIG. 7 is a block diagram showing a configuration of a speech encoding / decoding apparatus according to Embodiment 3 of the present invention. In this figure, the same reference numerals are given to the portions common to FIG. 1 or FIG. 4 described above.
[0050]
In this figure, the coding side of the speech coding / decoding apparatus according to the present embodiment is configured to include a speech mode determination unit 110, a speech coding unit 111, and a multiplexing unit 112. Reception buffer 101, separation unit 113, speech decoding unit 102, state data holding unit 103, lost frame compensation unit 104, lost frame decoding unit 105, time axis compression unit 114, switching unit 115, switching Unit 106 and a windowing addition unit 107.
[0051]
The speech encoding unit 111 performs speech encoding on input speech in units of frames and outputs encoded data. The sound mode determination unit 110 determines the sound mode of the input sound in frame units. Here, the audio mode represents the characteristic of whether the input audio in frame units is a steady interval or a transient interval such as a rising edge.
The multiplexing unit 112 multiplexes the encoded data output from the audio encoding unit 111 and the audio mode obtained by the audio mode determination unit 110 and outputs the result as encoded data including audio mode information.
[0052]
The reception buffer 101 receives encoded data that is encoded in units of frames on the transmission side and transmitted in units of packets. Further, the received packet unit data is decomposed into frame unit encoded data and output together with information (frame loss information) on whether or not frame loss has occurred. The separation unit 113 separates the encoded data including the audio mode information into audio mode information and audio encoded data. The subsequent operations differ depending on the frame loss information for the current frame.
[0053]
When the encoded data for the current frame is correctly received without loss or delay, the audio decoding unit 102 performs normal audio decoding using the encoded data, and outputs the decoded audio signal. At this time, the state data necessary for decoding is updated and held in the state data holding unit 103.
[0054]
Here, the state data depends on the specific configuration of the speech decoding unit 102, but in the case of a CELP decoder, there are an adaptive codebook, an LSP / gain parameter MA prediction filter state, and the like.
[0055]
On the other hand, when the encoded data for the current frame cannot be correctly received due to a lost frame, that is, loss or delay, the lost frame compensation unit 104 performs lost frame compensation and outputs the decoded speech signal obtained thereby. This compensation processing is arbitrary, and compensation processing using past coding parameters and decoded data by a conventional method may be performed.
[0056]
Also, if the previous frame is a lost frame and the encoded data of the lost frame is received with a time delay at the present time, the previous frame encoded data and the current frame encoded correctly received at the current time Data is input to the lost frame decoding unit 105.
[0057]
Lost frame decoding section 105 performs speech decoding by sequentially using the input encoded data of the previous frame and current frame ((a) and (b) of FIG. 8), and the decoded speech signal ( 8B is output to the switching unit 115, and the decoded speech signals (FIGS. 8A and 8B) for the encoded data of the previous frame and the current frame are output to the time axis compression unit 114. In addition, state data necessary for decoding is updated and held in the state data holding unit 103. The speech decoding process in the lost frame decoding unit 105 may be basically the same process as the speech decoding unit 102.
[0058]
The time-axis compression unit 114 performs time-axis compression so that the decoded speech signals ((a) and (b) in FIG. 8) for the encoded data of the previous frame and the current frame become signals of one frame interval, and are compressed. The later signal ((d) in FIG. 8) is output to the switching unit 115. Also, using the encoded data of the current frame received correctly at the present time, speech decoding section 102 performs speech decoding and outputs a decoded speech signal ((c) in FIG. 8). Then, in the switching unit 115, when the speech mode information of the previous frame, which is a lost frame obtained from the separation unit 113, indicates the transient mode, the output of the time axis compression unit 114 is not so, that is, in the steady mode. In this case, the output of the lost frame decoding unit 105 is output to the windowing addition unit 107.
[0059]
The windowing addition unit 107 uses the output signal ((b) or (d) of FIG. 8) that has passed through the switching unit 115 and the output signal ((c) of FIG. 8) of the speech decoding unit 102 to output the frame. A windowed addition is performed so that the output of the speech decoding unit 102 is dominant at the start end, and the output signal that has passed through the switching unit 115 is dominant at the end end, and is output as a decoded speech signal. Then, the switching unit 106 performs switching according to the above frame loss information and outputs a decoded audio signal of the frame.
[0060]
Next, the operation of the speech decoding apparatus according to this embodiment will be described.
FIG. 9 is a flowchart showing the operation on the encoding side.
In step 301, speech encoding is performed on input speech in frame units, and encoded data is output. Next, in step 302, the sound mode of the input sound in frame units is determined. In step 303, the encoded data obtained in step 301 and the audio mode obtained in step 302 are multiplexed and output as encoded data including audio mode information.
[0061]
FIG. 10 is a flowchart showing the operation on the decoding side.
First. In step 304, the encoded data that is encoded in units of frames on the transmission side and transmitted in units of packets is received. Further, the received packet unit data is decomposed into frame unit encoded data and output together with information on whether or not frame loss has occurred (frame loss information). Next, in step 305, the encoded data including the audio mode information is separated into audio mode information and audio encoded data.
[0062]
Next, in step 306, frame loss information for the current frame is determined. First, when it is determined that the encoded data for the frame at the current time has been correctly received without loss or delay, the process proceeds to step 307, where normal audio decoding is performed using the encoded data, and the decoded audio signal is output.
[0063]
On the other hand, if it is determined that the encoded data for the current frame cannot be received correctly due to a lost frame, that is, loss or delay, the process proceeds to step 308, where lost frame compensation is performed, and the resulting decoded speech signal is output.
[0064]
If it is determined that the previous frame is a lost frame and the encoded data of the lost frame has been received with a time delay at the current time, the process proceeds to step 309, where the encoded data of the previous frame and the current data are correctly received. The encoded data of the current frame is sequentially used for speech decoding, and decoded speech signals for the previous frame and current frame encoded data are output. Next, in step 310, state data necessary for decoding is updated.
[0065]
Next, in step 311, time-axis compression is performed so that the decoded speech signal for the encoded data of the previous frame and current frame obtained in step 309 becomes a signal of one frame interval, and the compressed signal is output. Next, in step 312, normal speech decoding is performed using the encoded data of the current frame correctly received at the present time, and a decoded speech signal is output.
[0066]
Next, in step 313, the voice mode information of the previous frame, which is a lost frame, obtained in step 305 is determined. If it is determined in this determination that the audio mode information is a transient mode, the process proceeds to step 314, where windowed addition is performed using the output of the process of step 311 and the output of the process of step 312 and output as a decoded audio signal. .
On the other hand, if it is determined that the speech mode information is not the transient mode, the process proceeds to step 315, where windowed addition using the decoded speech signal for the encoded data of the current frame obtained by the processing of step 309 and the output of the processing of step 312 is performed. And output as a decoded audio signal.
After performing the processing of Step 307, Step 308, Step 314, or Step 315, Step 316 outputs the decoded speech signal as an output signal.
[0067]
As described above, according to the speech decoding apparatus according to the present embodiment, when the encoded data of the lost frame is received with a time delay, the encoded data and the current When the encoded data of the normal frame is used to perform speech decoding of the past lost frame and the current normal frame, and the mode information determined on the encoding side indicates that it is a transient section, it is obtained. Since the decoded signal is time-axis compressed and used as decoded speech of a normal frame, it is possible to suppress degradation of decoded speech quality due to frame loss in a transient section.
[0068]
【The invention's effect】
As described above, according to the speech decoding apparatus of the present invention, when encoded data of a lost frame is received with a time delay, the decoded speech is output by frame compensation, and then the encoded data is used. After performing the speech decoding of the section corresponding to the past lost frame and updating the state data necessary for speech decoding, the state data closer to the original value is obtained by performing speech decoding of the current normal frame. Decoding used can be performed, and degradation of decoded speech quality can be suppressed.
[0069]
Further, according to the speech decoding apparatus of the present invention, when the encoded data of the lost frame is received with a time delay, after the decoded speech is output by frame compensation, the encoded data is used to store past loss After decoding the voice data in the section corresponding to the frame and updating the state data, the decoded voice signal obtained by performing the voice decoding of the current normal frame and the voice after updating the status data by frame compensation processing of the lost frame By performing windowed addition using the decoded audio signal of the current normal frame obtained by decoding, smoother decoding can be performed.
[0070]
In addition, the present invention Speech decoder According to the above, when the encoded data of the lost frame is received with a time delay, the decoded speech is output by the frame compensation, and then the past data is used by using the encoded data and the encoded data of the current normal frame. When speech decoding is performed for the period of the lost frame and the current normal frame, and the mode information determined on the encoding side indicates that it is a transitional period, the obtained decoded signal is time-axis compressed to obtain a normal frame By using this as the decoded speech, it is possible to suppress the degradation of the decoded speech quality due to the frame loss in the transition period.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 1 of the present invention.
FIG. 2 is a diagram for explaining processing when encoded data of a lost frame is received with a delay in the speech decoding apparatus according to Embodiment 1;
FIG. 3 is a flowchart showing the operation of the speech decoding apparatus according to the first embodiment.
FIG. 4 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 2 of the present invention.
FIG. 5 is a diagram for explaining processing when encoded data of a lost frame is received with a delay in the speech decoding apparatus according to Embodiment 2;
FIG. 6 is a flowchart showing the operation of the speech decoding apparatus according to the second embodiment.
FIG. 7 is a block diagram showing a configuration of a speech encoding / decoding device according to Embodiment 3 of the present invention.
FIG. 8 is a diagram for explaining processing when coded data of a lost frame is received with a delay in the speech coding / decoding apparatus according to Embodiment 3;
FIG. 9 is a flowchart showing the operation of the speech encoding / decoding device according to Embodiment 3.
FIG. 10 is a flowchart showing the operation of the speech encoding / decoding device according to Embodiment 3.
FIG. 11 is a block diagram showing a configuration of a conventional speech decoding apparatus
[Explanation of symbols]
101 Receive buffer
102 Speech decoding unit
103 State data holding unit
104 Loss frame compensator
105 Lost frame decoding unit
106, 115 switching unit
107 Window adder
110 Voice mode determination unit
111 Speech coding unit
112 Multiplexer
113 Separation part
114 Time axis compression unit

Claims

A reception buffer that receives and accumulates encoded data in units of frames, including at least audio mode information representing characteristics of audio in units of frames ;
Speech decoding means for performing normal speech decoding using the encoded data when the encoded data of the current frame is correctly received;
Lost frame compensation means for performing lost frame compensation when the encoded data of the current frame cannot be received due to loss;
When the reception buffer receives the encoded data with a time delay, it performs speech decoding using the encoded data and the encoded data of the current frame, and updates the state data necessary for decoding by the speech decoding means. Lost frame decoding means;
Time axis compression means for time axis compression of the lost frame decoded by the lost frame decoding means and the decoded speech signal of the current frame;
Depending on the voice mode information, the select one of the decoded speech signal of the current frame obtained in the decoded speech signal or said lost frame decoding means obtained by the time axis compression means, said decoding voice signal selected Windowed addition means for performing windowed addition using the decoded audio signal of the current frame decoded by the audio decoding means;
A speech decoding apparatus comprising:

The base station apparatus characterized by comprising a speech decoding equipment according to claim 1, wherein.

Communication terminal apparatus characterized by comprising a speech decoding equipment according to claim 1, wherein.

A reception buffering step for receiving and storing encoded data in units of frames, including at least audio mode information representing characteristics of audio in units of frames ;
A speech decoding step of performing normal speech decoding using the encoded data when the encoded data of the current frame is correctly received;
A loss frame compensation step for performing loss frame compensation when the encoded data of the current frame cannot be received due to loss;
When the reception buffer receives encoded data with a time delay, speech decoding is performed using the encoded data and the encoded data of the current frame, and state data necessary for decoding is updated in the speech decoding step. Lost frame decoding step;
A time axis compression step for time axis compression of the lost frame decoded in the loss frame decoding step and the decoded speech signal of the current frame;
According to the speech mode information, select either the decoded speech signal obtained in the time axis compression step or the decoded speech signal of the current frame obtained in the loss frame decoding step, and the selected decoded speech signal and the A windowed addition step for performing windowed addition using the decoded speech signal of the current frame decoded in the speech decoding step;
A speech decoding method comprising:

A recording medium storing a voice decoding program and readable by a computer,
The audio decoding program includes a reception buffering procedure for receiving and storing encoded data in units of frames including at least audio mode information representing characteristics of audio in units of frames;
A speech decoding procedure for performing normal speech decoding using the encoded data when the encoded data of the current frame is correctly received;
A lost frame compensation procedure for performing lost frame compensation when the current frame encoded data cannot be received due to loss;
Lost frame decoding procedure for performing speech decoding using the encoded data and the current frame encoded data when the encoded data is received with a time delay, and updating state data necessary for decoding ;
A time axis compression procedure for time axis compression of the lost frame decoded in the loss frame decoding procedure and the decoded speech signal of the current frame;
According to the speech mode information, select either the decoded speech signal obtained by the time axis compression procedure or the decoded speech signal of the current frame obtained by the loss frame decoding procedure, and the selected decoded speech signal and the A windowed addition procedure for performing windowed addition using the decoded speech signal of the current frame decoded by the speech decoding procedure;
Comprising
A recording medium characterized by the above.