JPH06266399A

JPH06266399A - Encoding device and speech encoding and decoding device

Info

Publication number: JPH06266399A
Application number: JP5049474A
Authority: JP
Inventors: Katsushi Seza; 勝志瀬座; Hirohisa Tazaki; 裕久田崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1993-03-10
Filing date: 1993-03-10
Publication date: 1994-09-22
Anticipated expiration: 2018-07-28
Also published as: JP3431655B2

Abstract

PURPOSE:To improve the quality of a decoded speech by the device which separates an input speech into a spectrum parameter and a sound source signal and performs encoding at every frame having fixed time length. CONSTITUTION:The devices are equipped with a 1st code search means which searches for a combination of a sound source model code word with a spectrum code word minimizing the distortion of a synthesized speech and the input speech and outputs a 1st code search result 28, an adaptive sound source code book 6 which contains a 1st or 2nd quantized sound source signal 9 or 10 as an adaptive sound source signal, a driving sound source generating means 4 which generates a driving sound source signal from driving sound source code words in the driving sound source code book 2, a 2nd code search means which searches for a driving sound source code word minimizing the distortion of both the input speech and the synthesized speech generated by using a spectrum code word minimizing the spectrum distortion of a 2nd quantized sound source signal 10 generated by using an adaptive sound source signal and the driving sound source signal and outputs it as a 2nd code search result 13, and an encoding means selecting means 16 which outputs one of the 1st and 2nd code search results.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、音声をディジタル伝
送あるいは蓄積する場合に用いられる音声符号化復号化
装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding / decoding device used for digital transmission or storage of voice.

【０００２】[0002]

【従来の技術】入力音声を、スペクトルパラメータと音
源信号に分離して固定時間長のフレーム毎に符号化する
従来の音声符号化復号化装置は、文献１”声門音源波モ
デルを用いた音声の分析合成方式の検討”（瀬座勝志、
田崎裕久、中島邦男、日本音響学会秋季研究発表会、１
−６−１０、ＰＰ２０９−２１０、１９９１）により報
告されている。この従来法においては、音源信号の符号
化に声門音源波の微分波形上で定義される音源モデルを
用い、スペクトルパラメータとして自己回帰係数（以下
ＡＲと略す）及び移動平均係数（以下ＭＡと略す）を用
いている。前述した文献では、２〜３Ｋｂｐｓ程度の低
ビットレート音声伝送において高品質な復号音声を得る
方式として声帯音源波モデルを用いた分析合成方式（Ｆ
ＶＱ−ＧＡＲＭＡ）を検討している。そこでは、声帯音
源波モデル、ＡＲパラメータ、及びＭＡパラメータを全
てベクトル量子化することにより、自然性の高い復号音
声が得られることを明かにしている。しかし、声帯音源
波モデルの予備選択に用いる音源ピーク位置の抽出誤り
や、声帯音源波モデルをマッチングする位置（音源位
置）の伝送を行わないことにより復号音声に劣化を生ず
る場合があった。従来のＦＶＱ−ＧＡＲＭＡ方式は、声
帯音源波モデルでＡＲＭＡフィルタを駆動することによ
って有声音を生成するものである。声帯音源波モデルに
は声帯音源波の微分波形上で定義されるモデルを用いて
いる。符号化部では声帯音源波モデル、ＡＲ及びＭＡパ
ラメータをベクトル量子化する際に、予備選択された各
コードの全ての組み合わせの中からＳＮＲｓｅｇが最大
になる組み合わせをフレームに一組選択する。復号化部
では各フレームで得られたコードをそれぞれ補間しなが
ら合成する。2. Description of the Related Art A conventional speech coder / decoder that separates an input speech into spectrum parameters and excitation signals and encodes each fixed-length frame is described in Reference 1 " Examination of analysis and synthesis method "(Katsushi Seza,
Hirohisa Tasaki, Kunio Nakajima, ASJ Autumn Meeting, 1
-6-10, PP209-210, 1991). In this conventional method, a source model defined on the differential waveform of the glottal source wave is used for encoding the source signal, and an autoregressive coefficient (hereinafter abbreviated as AR) and a moving average coefficient (hereinafter abbreviated as MA) are used as spectral parameters. Is used. In the above-mentioned document, as a method for obtaining a high-quality decoded speech in a low bit rate speech transmission of about 2 to 3 Kbps, an analysis and synthesis method (F
VQ-GARMA). There, it is clarified that a highly natural decoded voice can be obtained by vector-quantizing all vocal cord source wave models, AR parameters, and MA parameters. However, there is a case where the decoded speech is deteriorated due to the extraction error of the sound source peak position used for preliminary selection of the vocal cord source wave model and the transmission of the position (source position) matching the vocal cord source wave model. The conventional FVQ-GARMA method generates a voiced sound by driving an ARMA filter with a vocal cord source wave model. As the vocal cord source wave model, a model defined on the differential waveform of the vocal cord source wave is used. At the time of vector quantization of the vocal cord excitation wave model, AR and MA parameters, the encoding unit selects one combination having the maximum SNRseg for each frame from all combinations of the preselected codes. In the decoding unit, the codes obtained in each frame are interpolated and combined.

【０００３】図１４及び図１５は従来の分析合成方式
（ＦＶＱ−ＧＡＲＭＡ）による音声符号化復号化装置の
構成図である。図１４は符号化部を示し、図１５は復号
化部を示している。図１４及び図１５において、１は入
力音声、１１はＡＲ符号帳、１２はＡＲ符号語、１４は
ピッチ周期抽出手段、１５はピッチ周期、１９は音源開
始位置抽出手段、２０は音源開始位置、２１は音源モデ
ル符号帳、２２は音源モデル符号語、２３は音源モデル
生成手段、２６はＭＡ符号帳、２７はＭＡ符号語、２９
は音源モデル符号帳、３０は音源モデル符号語、３１は
音源モデル生成手段、３４はＭＡ符号帳、３５はＭＡ符
号語、３７は復号音声、４４はＡＲ符号帳、４５はＡＲ
符号語、５６は符号化結果、５７は符号探索手段、５８
は復号化手段、５９は量子化音源信号、６０は量子化音
源信号である。14 and 15 are block diagrams of a conventional speech coding / decoding apparatus according to the analysis and synthesis method (FVQ-GARMA). FIG. 14 shows an encoding unit, and FIG. 15 shows a decoding unit. In FIGS. 14 and 15, 1 is an input speech, 11 is an AR codebook, 12 is an AR codeword, 14 is a pitch period extraction means, 15 is a pitch period, 19 is a sound source start position extraction means, 20 is a sound source start position, 21 is a sound source model codebook, 22 is a sound source model codeword, 23 is a sound source model generation means, 26 is a MA codebook, 27 is a MA codeword, 29
Is an excitation model codebook, 30 is an excitation model codeword, 31 is an excitation model generation means, 34 is an MA codebook, 35 is an MA codeword, 37 is decoded speech, 44 is an AR codebook, and 45 is an AR.
A code word, 56 is an encoding result, 57 is a code search means, 58
Is a decoding means, 59 is a quantized excitation signal, and 60 is a quantized excitation signal.

【０００４】まず、図１４の符号化部について説明す
る。ＡＲ符号帳１１には典型的なＡＲをＡＲ符号語とし
て複数個格納し、ＭＡ符号帳２６には典型的なＭＡをＭ
Ａ符号語として複数個格納し、音源モデル符号帳２１に
は一ピッチ周期の音源信号を表す音源モデルのパラメー
タの典型的なものを音源モデル符号語として複数個格納
してある。ピッチ周期抽出手段１４は入力音声１よりピ
ッチ周期１５を抽出し出力する。音源開始位置抽出手段
１９は先行フレームが無声で当該フレームが有声の場
合、入力音声１より音源開始位置２０を抽出し出力す
る。音源モデル生成手段２３は音源モデル符号帳２１よ
り出力される音源モデル符号語２２より生成される一ピ
ッチ周期の音源信号をピッチ周期１５で繰り返した信号
を生成し、量子化音源信号５９として出力する。符号探
索手段５７は、音源開始位置２０とＡＲ符号語１２と量
子化音源信号５９とＭＡ符号語２７を用いて合成音声を
生成し、入力音声１と合成音声の歪を最小にするＡＲ符
号語とＭＡ符号語と音源符号語の組み合わせを探索し、
符号化結果５６として出力する。First, the encoding unit shown in FIG. 14 will be described. The AR codebook 11 stores a plurality of typical ARs as AR codewords, and the MA codebook 26 stores typical MAs.
A plurality of A codewords are stored, and the excitation model codebook 21 stores a plurality of typical excitation model parameters representing excitation signals of one pitch period as excitation model codewords. The pitch period extracting means 14 extracts the pitch period 15 from the input voice 1 and outputs it. When the preceding frame is unvoiced and the frame is voiced, the sound source start position extraction means 19 extracts the sound source start position 20 from the input voice 1 and outputs it. The excitation model generation means 23 generates a signal in which the excitation signal of one pitch period generated from the excitation model codeword 22 output from the excitation model codebook 21 is repeated in the pitch period 15 and outputs it as a quantized excitation signal 59. . The code search means 57 generates a synthetic speech using the excitation start position 20, the AR codeword 12, the quantized excitation signal 59 and the MA codeword 27, and an AR codeword that minimizes distortion of the input speech 1 and the synthetic speech. And search for combinations of MA codewords and excitation codewords,
The encoded result 56 is output.

【０００５】図１６は有声音の先頭フレームでの符号探
索手段５７の動作を説明するものである。図において実
線で示される量子化音源信号及び合成音声は当該フレー
ムでの信号を、点線で示される量子化音源信号及び合成
音声は次フレームでの信号を示す。符号探索手段５７
は、有声フレームにおいてピッチ周期を単位とした入力
音声を符号化する。当該フレームが有声音の先頭である
場合は、音源開始位置２０からピッチ周期単位で量子化
音源信号５９を並べた場合に当該フレームを超える範囲
の入力音声１を当該フレームの符号化対象とし、合成音
声を生成する。FIG. 16 illustrates the operation of the code search means 57 in the first frame of voiced sound. In the figure, the quantized excitation signal and the synthetic speech indicated by the solid line indicate the signal in the frame, and the quantized excitation signal and the synthetic speech indicated by the dotted line indicate the signal in the next frame. Code searching means 57
Encodes the input voice in units of pitch periods in voiced frames. When the frame is the head of the voiced sound, when the quantized sound source signals 59 are arranged from the sound source start position 20 in pitch cycle units, the input speech 1 in the range exceeding the frame is set as the encoding target of the frame and synthesized. Generate audio.

【０００６】図１６においては、音源開始位置２０から
ピッチ周期Ｐ１，Ｐ２までが現在のフレームの符号化対
象として合成音声が生成される。符号化はピッチ周期を
単位として行われるため、フレームの区切りとピッチ周
期の区切りは一致せず、この例では、時刻Ｔ３は時刻Ｆ
２とは一致せず、時刻Ｔ３−時刻Ｆ２の時間だけ、現在
のフレームを超えて符号化が行われる。有声音の先頭フ
レーム以外の有声フレームの場合、量子化音源信号５９
を先行フレームでの量子化音源信号に引き続いて当該フ
レームを超えるまで並べて合成音声を生成する。図にお
いては、Ｐ２というピッチ周期の次から次フレームのた
めの音声合成が生成される。すなわち、Ｐ３，Ｐ４，Ｐ
５というピッチ周期を用いて次フレームの符号化が行わ
れ合成音声が生成される。In FIG. 16, synthetic speech is generated from the sound source start position 20 to the pitch periods P1 and P2 as the current frame to be coded. Since the encoding is performed in units of pitch period, the frame delimiter and the pitch period delimiter do not match, and in this example, the time T3 is the time F.
2 does not match, and encoding is performed beyond the current frame only for the time T3 to time F2. For voiced frames other than the first frame of voiced sound, the quantized sound source signal 59
Are arranged until the quantized sound source signal in the preceding frame is exceeded and the synthesized speech is generated until the frame is exceeded. In the figure, speech synthesis is generated for the next frame from the pitch period of P2. That is, P3, P4, P
The pitch frame of 5 is used to encode the next frame to generate synthesized speech.

【０００７】次に図１５の復号化部について説明する。
図においてＡＲ符号帳４４、音源モデル符号帳３１、Ｍ
Ａ符号帳３４は、それぞれ符号化部におけるＡＲ符号帳
１１、音源モデル符号帳２１、ＭＡ符号帳２６と同じも
のである。音源モデル生成手段３１は、ピッチ周期１５
と符号化結果５６に対応する音源モデル符号帳２９内の
音源モデル符号語３０を用いて量子化音源信号６０を生
成する。復号化手段５８は、量子化音源信号６０と符号
化結果５６に対応するＡＲ符号帳４４内のＡＲ符号語４
５とＭＡ符号帳３４内のＭＡ符号語３５を用いて復号音
声３７を生成する。Next, the decoding unit shown in FIG. 15 will be described.
In the figure, an AR codebook 44, an excitation model codebook 31, M
The A codebook 34 is the same as the AR codebook 11, the excitation model codebook 21, and the MA codebook 26 in the encoding unit, respectively. The sound source model generating means 31 has a pitch period of 15
And the excitation model codeword 30 in the excitation model codebook 29 corresponding to the encoding result 56 is used to generate the quantized excitation signal 60. The decoding means 58 uses the AR codeword 4 in the AR codebook 44 corresponding to the quantized excitation signal 60 and the coding result 56.
5 and the MA codeword 35 in the MA codebook 34 are used to generate the decoded speech 37.

【０００８】図１７は復号化手段５８の動作を説明する
ものである。復号化手段５８は量子化音源信号６０を当
該フレームの先頭から当該フレームを超えるまで並べて
復号音声３７を生成する。図においては、当該フレーム
の先頭からフレーム周期９１，９２，９３を用いて復号
音声を生成する。復号音声の生成もピッチ周期単位で行
われるため、フレームを超えて復号音声が生成される場
合がある。図１７においては、ピッチ周期９３の終了時
刻Ｓ３はフレームの時刻Ｆ２を超えており、当該フレー
ムの時刻Ｆ２が終了しても当該フレームのための復号音
声が時刻Ｓ３まで生成される。FIG. 17 illustrates the operation of the decoding means 58. The decoding means 58 arranges the quantized excitation signal 60 from the head of the frame until it exceeds the frame to generate the decoded speech 37. In the figure, decoded speech is generated from the beginning of the frame using frame periods 91, 92, and 93. Since the decoded speech is also generated in pitch period units, the decoded speech may be generated over a frame. In FIG. 17, the end time S3 of the pitch cycle 93 exceeds the time F2 of the frame, and even if the time F2 of the frame ends, the decoded speech for the frame is generated until the time S3.

【０００９】次フレームでは、これまで復号された復号
音声に引き続いて点線で示される量子化音源信号６０を
並べて復号音声３７を生成する。図１７においては、ピ
ッチ周期９４，９５，９６が次フレームの復号音声生成
のために用いられる。この次フレームの復号音声生成も
ピッチ周期単位で行われるため、前のフレームの復号音
声生成が前のフレームを超えて行われる場合には、図１
７次に示すように次フレームにおいても、ずれたまま復
号音声を生成する。In the next frame, the decoded speech 37 is generated by arranging the quantized excitation signal 60 shown by the dotted line next to the decoded speech decoded so far. In FIG. 17, pitch periods 94, 95 and 96 are used for generation of decoded speech of the next frame. Since the decoded voice generation of the next frame is also performed in pitch cycle units, when the decoded voice generation of the previous frame is performed over the previous frame,
7 As shown below, decoded speech is also generated in the next frame with a shift.

【００１０】[0010]

【発明が解決しようとする課題】従来の音声符号化復号
化装置は、声帯音源波コードの予備選択を音源ピーク位
置と過去のフレームで選択された声帯音源波コードを基
準として行っているが、語頭部分や過度部での音源ピー
ク位置の自動抽出には誤りが多く、予備選択がうまく働
かない場合があった。図１８（ａ）に残差波形、図１８
（ｂ）（ｃ）に声帯音源波モデルの微分波形を示す。音
源ピーク位置が正しく抽出されている場合（ｂ）に比
べ、誤って抽出された場合（ｃ）は声帯音源波コードの
選択を誤り、ＳＮＲｓｅｇは急速に劣化する。このよう
に、音源信号の符号化に音源モデルを用いた量子化音源
信号のみを使っているために音源モデルの適合の悪い話
者の場合に復号音声の品質が劣化する場合があった。ま
た、ピッチ周期に応じて様態が異なるＭＡと音源モデル
に対して各々固定の符号帳を用いて量子化するために復
号音声の品質が劣化する場合があった。また、符号化部
ではピッチ長を補間により微調整しながら音源を誤り返
した場合にＳＮＲｓｅｇが最大になるように有声音の先
頭の音源位置とそのピッチ長を決定しているが、この音
源位置を復号化部に伝送しない構成のため、符号化部と
復号化部で各コードを補間した結果に大きな差異を生じ
復号音声品質が劣化する例があった。すなわち、復号化
部に有声音の先頭フレームにおける音源開始位置が伝送
されないために、図１６及び図１７に示すように、符号
化部と復号化部において同一フレーム内に含まれる音源
モデルの数が異なる場合がある。この様なフレームにお
いてパワーやピッチ周期の変動が大きいと、復号化部の
第一の量子化音源信号は符号化部の第一の量子化音源信
号との間に大きな差異を生じ、復号音声の品質が劣化す
る場合があった。In the conventional speech coding / decoding apparatus, the vocal cord excitation code is preselected based on the excitation peak position and the vocal cord excitation code selected in the past frame. There were many errors in automatic extraction of the sound source peak position at the beginning of the word and the transient part, and the preliminary selection sometimes did not work well. FIG. 18A shows the residual waveform, and FIG.
(B) and (c) show differential waveforms of the vocal cord source wave model. When the sound source peak position is correctly extracted (b), when the sound source peak position is erroneously extracted (c), the vocal cord source wave code is selected incorrectly, and the SNRseg rapidly deteriorates. As described above, since only the quantized excitation signal using the excitation model is used for encoding the excitation signal, the quality of the decoded speech may be deteriorated in the case of a speaker whose excitation model is poorly adapted. In addition, the quality of the decoded speech may be deteriorated because the fixed codebooks are quantized for the MA and the excitation model, which have different aspects depending on the pitch period. In addition, the encoding unit determines the head sound source position of the voiced sound and its pitch length so that SNRseg becomes maximum when the sound source is returned in error while finely adjusting the pitch length by interpolation. There is an example in which the decoded voice quality is deteriorated due to a large difference in the results of interpolating each code between the encoding unit and the decoding unit because the configuration is not transmitted to the decoding unit. That is, since the sound source start position in the head frame of voiced sound is not transmitted to the decoding unit, as shown in FIGS. 16 and 17, the number of sound source models included in the same frame in the encoding unit and the decoding unit is May be different. In such a frame, when the fluctuation of the power or the pitch period is large, the first quantized excitation signal of the decoding unit causes a large difference from the first quantized excitation signal of the encoding unit, and the decoded speech There were cases where the quality deteriorated.

【００１１】本発明は上記課題を解消するためになされ
たもので、復号音声の品質を向上させることを目的とし
ている。The present invention has been made to solve the above problems, and an object thereof is to improve the quality of decoded speech.

【００１２】[0012]

【課題を解決するための手段】この発明の請求項１記載
の発明に係る音声符号化復号化装置は、音質劣化を低減
するため、たとえばＧＡＲＭＡとＣＥＬＰとのマルチモ
ード化を行い、ＣＥＬＰ系で用いられている適応コード
と駆動音源コードを用いた音源（図１８（ｄ）参照）を
用いてＳＮＲｓｅｇを確保し、ＦＶＱ−ＧＡＲＭＡを用
いた場合と適応コードブックと駆動音源コードブックを
用いた場合でＳＮＲｓｅｇの良い方を選択するようにし
たものである。The speech coding / decoding apparatus according to the invention of claim 1 of the present invention, in order to reduce the deterioration of the sound quality, for example, multimodes GARMA and CELP, and uses the CELP system. A SNRseg is secured by using a sound source that uses the adaptive code and the driving sound source code that are used (see FIG. 18D), and when FVQ-GARMA is used and when the adaptive codebook and the driving sound source codebook are used. The one with the better SNRseg is selected.

【００１３】この発明の請求項２記載の発明に係る音声
符号化復号化装置は、符号化部に、第一の量子化音源信
号とスペクトル符号帳内のスペクトル符号語から生成し
た合成音声と入力音声の歪を最小にする音源モデル符号
語とスペクトル符号語の組み合わせを探索し、その探索
結果を第一の符号探索結果としてその時の歪と共に符号
化手段選択手段に出力し、第一の量子化音源信号を適応
音源符号帳に出力する第一の符号探索手段と、先行フレ
ームにおいて求めた第一の量子化音源信号または第二の
量子化音源信号を適応音源信号として格納した適応音源
符号帳と、予め用意された音源信号を駆動音源符号語と
して複数個格納した駆動音源符号帳と、前記駆動音源符
号帳内の駆動音源符号語を前記ピッチ周期で繰り返した
駆動音源信号を生成する駆動音源生成手段と、前記適応
音源符号帳内の適応音源信号と前記駆動音源信号より生
成される第二の量子化音源信号と前記スペクトル符号帳
内のスペクトル符号語を用いて生成した合成音声と入力
音声の歪を最小にする駆動音源符号語を探索し、その探
索結果を第二の符号探索結果としてその時の歪と共に符
号化手段選択手段に出力し、第二の量子化音源信号を適
応音源符号帳に出力する第二の符号探索手段と、前記第
一の符号探索結果と前記第二の符号探索結果の内、より
小さい歪を持つ方を符号化結果として選択し、当該フレ
ームの符号化結果として出力するとともに、どちらの符
号探索結果を選択したのかを符号化手段選択信号として
出力する符号化手段選択手段を備え、復号化部に、符号
化部より入力された符号化手段選択信号に従い第一の復
号化手段と第二の復号化手段を選択する復号化手段選択
手段と、符号化部と同じスペクトル符号帳と、符号化部
と同じ音源モデル符号帳と、第一の量子化音源信号と符
号化部より入力された符号化結果に対応する前記スペク
トル符号帳内のスペクトル符号語を用いて復号音声を生
成する第一の復号化手段と、符号化部と同じ適応音源符
号帳と、符号化部と同じ駆動音源符号帳と、前記符号化
結果に対応する前記駆動音源符号帳内の駆動音源符号語
と前記ピッチ周期から駆動音源信号を生成する駆動音源
生成手段と、前記スペクトル符号語と前記適応音源符号
帳内の適応音源信号と前記駆動音源生成手段の出力する
駆動音源信号より復号音声を生成する第二の復号化手段
を備える。In the speech coder / decoder according to the second aspect of the present invention, the synthesized speech generated from the first quantized excitation signal and the spectrum code word in the spectrum codebook is input to the coder. The combination of the excitation model codeword and the spectrum codeword that minimizes the distortion of the voice is searched, and the search result is output as the first code search result to the coding means selecting means together with the distortion at that time, and the first quantization is performed. A first code search means for outputting an excitation signal to an adaptive excitation codebook; an adaptive excitation codebook storing the first quantized excitation signal or the second quantized excitation signal obtained in the preceding frame as an adaptive excitation signal , A driving excitation codebook in which a plurality of prepared excitation signals are stored as driving excitation codewords, and a driving excitation signal in which the driving excitation codewords in the driving excitation codebook are repeated at the pitch cycle are generated. Driving excitation generating means, an adaptive excitation signal in the adaptive excitation codebook, a second quantized excitation signal generated from the driving excitation signal, and a synthetic speech generated using a spectrum codeword in the spectrum codebook. And a drive excitation codeword that minimizes the distortion of the input speech, and outputs the search result as the second code search result together with the distortion at that time to the encoding means selecting means to adapt the second quantized excitation signal. Second code search means for outputting to the excitation codebook, and one of the first code search result and the second code search result, whichever has a smaller distortion is selected as the coding result, and the code of the frame is selected. The coding unit selection unit outputs the coding result and outputs, as a coding unit selection signal, which coding search result is selected, and the decoding unit selects the coding unit input from the coding unit. Decoding means selecting means for selecting the first decoding means and the second decoding means according to the code, the same spectrum codebook as the coding section, the same excitation model codebook as the coding section, and the first quantum. First decoding means for generating decoded speech using a coded excitation signal and a spectrum codeword in the spectrum codebook corresponding to the coding result input from the coding section, and the same adaptive excitation code as the coding section Book, the same driving excitation codebook as the encoding unit, driving excitation code generation means for generating a driving excitation signal from the driving excitation codeword in the driving excitation codebook corresponding to the encoding result and the pitch period, The second decoding means is provided for generating decoded speech from the spectrum codeword, the adaptive excitation signal in the adaptive excitation codebook, and the driving excitation signal output from the driving excitation generating means.

【００１４】本発明の請求項３記載の発明に係わる音声
符号化復号化装置は、符号化部と復号化部の適応音源符
号帳に、第一の量子化音源信号を保持する第一の音源記
憶手段と第二の量子化音源信号を保持する第二の音源記
憶手段と、前記第一の音源記憶手段と前記第二の音源記
憶手段を切り換える符号語切換手段を備え、合成音声と
入力音声の歪を最小にする適応音源信号を前記適応音源
符号帳より選択し、どちらを選択したかを第二の符号化
結果に含めて出力する第二の符号探索手段を備える。According to a third aspect of the present invention, there is provided a speech coding / decoding apparatus in which an adaptive excitation codebook of a coding section and a decoding section holds a first quantized excitation signal. Storage means and second sound source storage means for holding a second quantized sound source signal; and code word switching means for switching between the first sound source storage means and the second sound source storage means. A second code search means for selecting from the adaptive excitation codebook an adaptive excitation signal that minimizes the distortion of, and outputting which is included in the second encoding result.

【００１５】本発明の請求項４及び請求項５記載の発明
に係わる音声符号化復号化装置の符号化部と復号化部の
スペクトル符号帳、音源モデル符号帳及び駆動音源符号
帳は、それぞれピッチ周期に対応する複数の副符号帳
と、入力されるピッチ周期に応じて前記副符号帳を切り
換える副符号帳切換手段を備える。The spectrum codebook, the excitation model codebook, and the driving excitation codebook of the coding unit and the decoding unit of the speech coding / decoding apparatus according to the fourth and fifth aspects of the present invention are respectively pitched. A plurality of sub-codebooks corresponding to the cycle and sub-codebook switching means for switching the sub-codebook according to the input pitch cycle are provided.

【００１６】本発明の請求項６及び請求項７記載の発明
に係わる音声符号化復号化装置は、有声音の先頭フレー
ムの音源位置だけを、その直前の無声フレームにおいて
伝送する構成とした。後続するフレームでは符号化部と
復号化部においてピッチ長を補間しながら音源を繰り返
すという同一の処理を行うため、フレーム毎に音源位置
を伝送する必要はない。すなわち、符号化部に有声音の
先頭フレームの場合、入力音声より音源開始位置を抽出
し、先行する無声フレームにおいて復号化部に出力する
音源開始位置抽出手段と、復号化部に、符号化部より入
力された音源開始位置に第一の量子化音源信号を同期し
て復号音声を生成する第一の復号化手段を備える。According to the sixth and seventh aspects of the present invention, the voice encoding / decoding apparatus is configured to transmit only the sound source position of the head frame of voiced sound in the unvoiced frame immediately before it. In the subsequent frame, the same process of repeating the sound source while interpolating the pitch length in the encoding unit and the decoding unit is performed, so it is not necessary to transmit the sound source position for each frame. That is, in the case of the head frame of voiced sound in the encoding unit, the excitation start position is extracted from the input voice and is output to the decoding unit in the preceding unvoiced frame. A first decoding means for generating decoded speech by synchronizing the first quantized excitation signal with the excitation start position input by the first decoding means is provided.

【００１７】[0017]

【作用】請求項１記載の発明においては、たとえば、Ｇ
ＡＲＭＡとＣＥＬＰという異なる方式を用いて符号化し
よりよい結果をもたらす方式を選択するので、符号化品
質がどちらか一方の方式による場合よりも向上する。In the invention described in claim 1, for example, G
Since encoding is performed by using different schemes, ARMA and CELP, which gives better results, the encoding quality is improved as compared with either scheme.

【００１８】また請求項２記載の発明においては、符号
化手段選択手段は第一の符号探索手段と第二の符号探索
手段のうち入力音声と合成音声の歪を小さくする方を選
択する。According to the second aspect of the present invention, the coding means selecting means selects one of the first code searching means and the second code searching means that reduces distortion of the input voice and the synthesized voice.

【００１９】また、請求項３記載の発明においては、第
二の符号探索手段は適応音源符号帳に保持されている第
一の量子化音源信号と第二の量子化音源信号のうち入力
音声と合成音声の歪を小さくする信号を適応音源信号と
して選択する。According to the third aspect of the present invention, the second code searching means selects the input speech of the first quantized excitation signal and the second quantized excitation signal held in the adaptive excitation codebook. A signal that reduces the distortion of the synthesized speech is selected as the adaptive sound source signal.

【００２０】また、請求項４，５記載の発明において
は、スペクトル符号帳、音源モデル符号帳、駆動音源符
号帳はピッチ周期に応じてそれぞれが持っている副符号
帳を切り換える。Further, in the present invention as defined in claims 4 and 5, the spectral codebook, the excitation model codebook, and the driving excitation codebook switch their own subcodebooks in accordance with the pitch period.

【００２１】また、請求項６、７記載の発明において
は、有声音の先頭のフレームでの音源開始位置を復号化
部に伝送する場合に有声フレームに先行する無声フレー
ムで伝送する。According to the sixth and seventh aspects of the invention, when transmitting the sound source start position in the leading frame of the voiced sound to the decoding unit, the unvoiced frame preceding the voiced frame is transmitted.

【００２２】[0022]

【Example】

実施例１．図１と図２はこの発明に係わる音声符号化復
号化装置の一実施例の構成図であり、以下、本発明の動
作をこの図において説明する。なお図１は符号化部を示
し、図２は復号化部を示しており、図１と図２において
図８と図９と同一の部分については同一符号を付し、説
明を省略する。図において、２は駆動音源符号帳、３は
駆動音源符号語、４は駆動音源生成手段、５は駆動音源
信号、６は適応音源符号帳、７は適応音源信号、８は第
一の符号探索手段、９は第一の量子化音源信号、１０は
第二の量子化音源信号、１３は第二の符号探索結果、１
６は復号化手段選択手段、１７は符号化手段選択信号、
１８は符号化結果、２５は第二の符号探索手段、２８は
第一の符号探索結果、３３は第一の復号化手段、３８は
符号化手段選択手段、３９は符号化結果、４０は適応音
源符号帳、４１は適応音源信号、４２は第二の量子化音
源信号、４３は第二の復号化手段、４６は駆動音源符号
帳、４７は駆動音源符号語、４８は駆動音源生成手段、
４９は駆動音源信号である。Example 1. 1 and 2 are block diagrams of an embodiment of a speech encoding / decoding apparatus according to the present invention, and the operation of the present invention will be described below with reference to this figure. 1 shows a coding unit and FIG. 2 shows a decoding unit. In FIGS. 1 and 2, the same parts as those in FIGS. 8 and 9 are designated by the same reference numerals, and the description thereof will be omitted. In the figure, 2 is a driving excitation codebook, 3 is a driving excitation codeword, 4 is a driving excitation generating means, 5 is a driving excitation signal, 6 is an adaptive excitation codebook, 7 is an adaptive excitation signal, and 8 is a first code search. Means, 9 is the first quantized excitation signal, 10 is the second quantized excitation signal, 13 is the second code search result, 1
6 is a decoding means selection means, 17 is an encoding means selection signal,
18 is the encoding result, 25 is the second code searching means, 28 is the first code searching result, 33 is the first decoding means, 38 is the encoding means selecting means, 39 is the encoding result, and 40 is adaptive. Excitation codebook, 41 is an adaptive excitation signal, 42 is a second quantized excitation signal, 43 is a second decoding means, 46 is a driving excitation codebook, 47 is a driving excitation codeword, 48 is a driving excitation generating means,
Reference numeral 49 is a driving sound source signal.

【００２３】まず、符号化部について説明する。第一の
符号探索手段２５は、図１０に示すように有声音の先頭
フレームの場合は音源開始位置２０に第一の量子化音源
信号２４を同期させ、この第一の量子化音源信号とＡＲ
符号語１２とＭＡ符号語２７を用いて合成音声を生成
し、それ以外の有声フレームでは、先行フレームで得ら
れた合成音声に引き続いて第一の量子化音源信号２４を
並べて合成音声を生成し、この合成音声と入力音声１の
歪を最小にするＡＲ符号語１２とＭＡ符号語２７と音源
モデル符号語２２の組み合わせを探索し、その探索結果
を符号化結果２８とし、その歪と共に符号化手段選択手
段１６に出力し、またその組み合わせにおける第一の量
子化音源信号２４を第一の量子化音源信号９として適応
音源符号帳６に出力する。First, the encoding unit will be described. As shown in FIG. 10, the first code searching means 25 synchronizes the first quantized sound source signal 24 with the sound source start position 20 in the case of the head frame of the voiced sound, and the first quantized sound source signal and the AR.
Synthetic speech is generated using the codeword 12 and the MA codeword 27, and in the other voiced frames, the first quantized sound source signal 24 is arranged subsequent to the synthetic speech obtained in the preceding frame to generate synthetic speech. , A combination of the AR codeword 12, the MA codeword 27, and the excitation model codeword 22 that minimizes the distortion of the synthesized speech and the input speech 1 is searched, and the search result is set as a coding result 28, which is coded together with the distortion. The first quantized excitation signal 24 in the combination is output to the adaptive excitation codebook 6 as the first quantized excitation signal 9.

【００２４】駆動音源符号帳２は例えばＬＰＣ残差信号
の中で典型的な一ピッチ周期の信号やガウス性雑音信号
を駆動音源符号語として複数個格納しておく。駆動音源
生成手段４は駆動音源符号帳２内の駆動音源符号語３を
ピッチ周期１５で繰り返した駆動音源信号５を生成す
る。以下この第一の符号探索手段２５が行う動作方式を
ＧＡＲＭＡ方式と呼ぶことにする。The driving excitation codebook 2 stores a plurality of typical one-pitch period signals and Gaussian noise signals among the LPC residual signals as driving excitation codewords. The drive excitation generator 4 generates a drive excitation signal 5 by repeating the drive excitation codeword 3 in the drive excitation codebook 2 at a pitch cycle 15. Hereinafter, the operation method performed by the first code searching means 25 will be referred to as the GARMA method.

【００２５】第二の符号探索手段８は駆動音源信号５と
適応音源符号帳６内の適応音源信号７より生成される量
子化音源信号とスペクトル符号帳１１内のスペクトル符
号語１２を用いて合成音声を生成し、合成音声と入力音
声１の歪を最小にする駆動音源符号語３とスペクトル符
号語１２の組み合わせを探索し、その探索結果を第二の
符号探索結果１３とし、その歪と共に符号化手段選択手
段１６に出力し、この組み合わせにおける第二の量子化
音源信号１０を適応音源符号帳６に出力する。以下この
第二の符号探索手段８が符号化に用いる方式をＣＥＬＰ
方式と呼ぶことにする。The second code searching means 8 synthesizes the driving excitation signal 5 and the quantized excitation signal generated from the adaptive excitation signal 7 in the adaptive excitation codebook 6 and the spectrum codeword 12 in the spectrum codebook 11. A combination of the driving excitation codeword 3 and the spectrum codeword 12 that generates the voice and minimizes the distortion of the synthesized voice and the input voice 1 is searched, and the search result is set as the second code search result 13, and the code is added together with the distortion. The second quantized excitation signal 10 in this combination is output to the adaptive excitation codebook 6. Hereinafter, the method used by the second code search means 8 for encoding will be CELP.
I will call it the method.

【００２６】符号化手段選択手段１６は第一の符号探索
結果２８と第二の符号探索結果１３の内で歪の小さい方
を当該フレームにおける符号化結果１８として選択し、
どちらを選択したかを表す符号化手段選択信号１７と前
記符号化結果１８を出力する。The coding means selecting means 16 selects one of the first code search result 28 and the second code search result 13 with the smaller distortion as the coding result 18 in the frame,
An encoding means selection signal 17 indicating which is selected and the encoding result 18 are output.

【００２７】図３は適応音源符号帳６の動作を説明した
図である。適応音源符号帳６は当該フレームにおいて第
一の符号探索手段が選択された場合は第一の量子化音源
信号９を音源記憶手段５０に格納し、第二の符号探索手
段が選択された場合は第二の量子化音源信号１０を音源
記憶手段５０に格納し、適応音源信号７として出力す
る。FIG. 3 is a diagram for explaining the operation of the adaptive excitation codebook 6. The adaptive excitation codebook 6 stores the first quantized excitation signal 9 in the excitation storage means 50 when the first code searching means is selected in the frame, and when the second code searching means is selected. The second quantized excitation signal 10 is stored in the excitation storage means 50 and output as the adaptive excitation signal 7.

【００２８】従来の適応音源符号帳６は符号探索手段に
より、量子化音源信号を記憶しているのに対して、この
実施例においては、適応音源符号帳６に記憶する量子化
音源信号を第一の符号探索手段２５により、出力された
第一の量子化音源信号９と第二の符号探索手段８により
出力された第二の量子化音源信号１０のうちから最新の
量子化音源信号を選択し音源記憶手段５０に格納する。
どちらが最新の量子化音源信号であるかは符号化手段選
択手段から出力された符号化手段選択信号１７により判
定することができる。従って、適応音源符号帳６は符号
化手段選択信号１７を入力してスイッチを切り換えるこ
とにより、第一の量子化音源信号９と第二の量子化音源
信号を１０を切り換えて音源記憶手段５０に入力する。
こうして適応音源符号帳６は最新に用いられた量子化音
源信号を第二の符号探索手段に供給することが可能にな
る。In the conventional adaptive excitation codebook 6, the quantized excitation signal is stored by the code search means, whereas in the present embodiment, the quantized excitation signal stored in the adaptive excitation codebook 6 is stored as the first. The one code searching means 25 selects the latest quantized sound source signal from the first quantized sound source signal 9 outputted by the second code searching means 8 and the second quantized sound source signal 10 outputted by the second code searching means 8. Then, it is stored in the sound source storage means 50.
Which is the latest quantized excitation signal can be determined by the coding means selection signal 17 output from the coding means selection means. Therefore, the adaptive excitation codebook 6 inputs the encoding means selection signal 17 and switches the switch to switch the first quantized excitation signal 9 and the second quantized excitation signal 10 to the excitation storage means 50. input.
In this way, the adaptive excitation codebook 6 can supply the most recently used quantized excitation signal to the second code searching means.

【００２９】次に図２の復号化部１ｂについて説明す
る。図において駆動音源符号帳４６、適応音源符号帳４
０は、それぞれ符号化部１ａにおける駆動音源符号帳
２、適応音源符号帳６と同一のものである。復号化手段
選択手段３８は符号化手段選択信号１７に従い第一の復
号化手段３３と第二の復号化手段４３のどちらかに、符
号化結果１８をそのまま符号化結果３９として出力す
る。Next, the decoding unit 1b shown in FIG. 2 will be described. In the figure, driving excitation codebook 46 and adaptive excitation codebook 4
0 is the same as the driving excitation codebook 2 and the adaptive excitation codebook 6 in the encoding unit 1a, respectively. The decoding means selecting means 38 outputs the coding result 18 as it is as the coding result 39 to either the first decoding means 33 or the second decoding means 43 according to the coding means selection signal 17.

【００３０】第一の復号化手段３３は第一の量子化音源
信号３２と符号化結果３９に対応するＭＡ符号帳３４内
のＭＡ符号語３５とスペクトル符号帳４４内のＡＲ符号
語４５を用いて復号音声３７を生成し、第一の量子化音
源信号３２をそのまま第一の量子化音源信号３６として
適応音源符号帳４０に出力する。The first decoding means 33 uses the first quantized excitation signal 32 and the MA codeword 35 in the MA codebook 34 and the AR codeword 45 in the spectrum codebook 44 corresponding to the coding result 39. Then, the decoded speech 37 is generated, and the first quantized excitation signal 32 is directly output to the adaptive excitation codebook 40 as the first quantized excitation signal 36.

【００３１】駆動音源生成手段４８は、ピッチ周期１５
と符号化結果３９に対応する駆動音源符号帳４６内の駆
動音源符号語４７より駆動音源信号４９を生成する。第
二の復号化手段４３は、適応音源符号帳４０内の適応音
源信号４１と駆動音源信号４９より生成される量子化音
源信号と符号化結果３９に対応するスペクトル符号帳４
４内のＡＲ符号語４５を用いて復号音声３７を生成し、
前記量子化音源信号を適応音源符号帳に第二の量子化音
源信号４２として出力する。The driving sound source generating means 48 has a pitch period of 15
And a drive excitation signal 49 is generated from the drive excitation codeword 47 in the drive excitation codebook 46 corresponding to the encoding result 39. The second decoding means 43 is a spectrum codebook 4 corresponding to the quantized excitation signal generated from the adaptive excitation signal 41 and the driving excitation signal 49 in the adaptive excitation codebook 40 and the encoding result 39.
Generate a decoded speech 37 using the AR codeword 45 in 4;
The quantized excitation signal is output as a second quantized excitation signal 42 to the adaptive excitation codebook.

【００３２】以上のようにこの実施例では、第一の符号
化探索手段は音源モデル符号帳２１に格納された音源モ
デルを用いた量子化音源信号のみを用いて符号化を行っ
ている。一方第二の符号探索手段は適応音源符号帳を用
いることにより、直前に符号化した量子化音源信号との
差分を用いることにより符号化を行っている。このよう
にこの符号化の方式が異なるふたつの符号探索手段をそ
れぞれ動作させ、その動作結果を比較することにより、
符号化の歪の小さい方を選択する点がこの実施例の特徴
である。すなわち、この実施例はＧＡＲＭＡ方式とＣＥ
ＬＰ方式の二つの方式を用い、二つの方式から得られた
符号化結果のよりよい方を選択して出力することを特徴
とするものである。As described above, in this embodiment, the first coding search means performs coding using only the quantized excitation signal using the excitation model stored in the excitation model codebook 21. On the other hand, the second code search means uses the adaptive excitation codebook to perform encoding by using the difference from the quantized excitation signal encoded immediately before. In this way, by operating the two code search means with different encoding methods respectively and comparing the operation results,
A feature of this embodiment is that one with a smaller coding distortion is selected. That is, this embodiment uses the GARMA method and CE.
It is characterized by using two LP systems and selecting and outputting the better one of the coding results obtained from the two systems.

【００３３】実施例２．上記実施例１においては、ＧＡ
ＲＭＡ方式とＣＥＬＰ方式の二つの方式を用いて比較選
択する場合を示したが、二つの符号化方式はこれらの方
式に限るものではなく、その他の方式を利用するもので
もかまない。あるいは同一方式のものであっても、一方
に改良を加えたものや変更を加えたものであってもかま
わない。更に、二つの方式の組み合わせに限らず、三つ
以上の方式の組み合わせであってもかまわない。Example 2. In the first embodiment, the GA
Although the case where two methods of RMA method and CELP method are used for comparison and selection is shown, the two encoding methods are not limited to these methods, and other methods may be used. Alternatively, the same system may be used, or one of them may be improved or modified. Further, it is not limited to the combination of two methods, and may be a combination of three or more methods.

【００３４】実施例３．図４はこの発明に係わる音声符
号化復号化装置の一実施例における適応音源符号帳６の
構成図であり、以下、適応音源符号帳６の動作をこの図
において説明する。図３と同一の部分は同一番号を付
す。適応音源符号帳６は第一の符号探索手段が選択され
た場合、第一の量子化音源信号９を第一の音源記憶手段
５１に格納し、第二の符号探索手段が選択された場合、
第二の量子化音源信号１０を第二の音源記憶手段５２に
格納しておく。切換手段５３は第一の音源記憶手段５１
と第二の音源記憶手段５２に格納されている信号をそれ
ぞれ適応音源信号７として出力する。第二の符号探索手
段８は合成音声と入力音声の歪を小さくする適応音源信
号７を選択し、選択結果を符号探索結果１３に含めて出
力する。Example 3. FIG. 4 is a configuration diagram of the adaptive excitation codebook 6 in one embodiment of the speech encoding / decoding device according to the present invention. Hereinafter, the operation of the adaptive excitation codebook 6 will be described with reference to this figure. The same parts as those in FIG. 3 are denoted by the same reference numerals. The adaptive excitation codebook 6 stores the first quantized excitation signal 9 in the first excitation storage means 51 when the first code searching means is selected, and when the second code searching means is selected,
The second quantized excitation signal 10 is stored in the second excitation storage means 52. The switching means 53 is the first sound source storage means 51.
And the signals stored in the second sound source storage means 52 are output as the adaptive sound source signals 7, respectively. The second code search means 8 selects the adaptive sound source signal 7 that reduces the distortion between the synthetic speech and the input speech, and outputs the selection result included in the code search result 13.

【００３５】すなわち第二の符号探索手段８は、選択信
号８ａを適用音源符号帳６に出力し、第一の音源記憶手
段５１と第二の音源記憶手段に記憶された音源信号を切
り換える。第一の音源記憶手段５１は、第一の符号探索
手段から出力された最新の量子化音源信号９を記憶して
いる。第二の音源記憶手段５２は第２の符号探索手段か
ら出力された最新の量子化音源信号を記憶している。第
二の符号探索手段８は選択信号８ａを出力し、符号語切
換手段５３を動作させることにより、第一の音源記憶手
段５１と第二の音源記憶手段５２に記憶されている量子
化音源信号をそれぞれ入力し、両方の量子化音源信号に
基づいて符号化を試みる。その結果、より歪の小さい方
を用いて、符号化を行い符号探索結果として出力する。That is, the second code searching means 8 outputs the selection signal 8a to the applied excitation codebook 6, and switches the excitation signals stored in the first excitation storage means 51 and the second excitation storage means. The first sound source storage means 51 stores the latest quantized sound source signal 9 output from the first code search means. The second sound source storage means 52 stores the latest quantized sound source signal output from the second code search means. The second code searching means 8 outputs the selection signal 8a and operates the code word switching means 53, so that the quantized excitation signal stored in the first excitation storage means 51 and the second excitation storage means 52. , Respectively, and try encoding based on both quantized excitation signals. As a result, the one with the smaller distortion is used for encoding and output as the code search result.

【００３６】また復号化部の適応音源符号帳４０は図４
における適応音源符号帳６と同一のものである。復号化
部の第二の復号化手段４３は、符号化結果３９に従い適
応音源信号符号帳４０の中から適応音源信号４１を選択
する。The adaptive excitation codebook 40 of the decoding unit is shown in FIG.
Is the same as the adaptive excitation codebook 6 in FIG. The second decoding means 43 of the decoding unit selects the adaptive excitation signal 41 from the adaptive excitation signal codebook 40 according to the encoding result 39.

【００３７】実施例４．上記実施例３においては、二つ
の符号探索手段がある場合を示したが、三つ以上の符号
探索手段が存在する場合には、図４に示した適用音源符
号帳６の内部には音源記憶手段がそれぞれの符号探索手
段に対応して存在し、切換手段５３はこれら３つ以上の
音源記憶手段に記憶された量子化音源信号を切り換え
る。Example 4. In the third embodiment, the case where there are two code search means is shown, but when there are three or more code search means, the excitation code is stored in the applicable excitation codebook 6 shown in FIG. Means exist corresponding to each code searching means, and the switching means 53 switches the quantized excitation signal stored in these three or more excitation storage means.

【００３８】実施例５．図５はこの発明に係わる音声符
号化復号化装置の一実施例におけるＭＡ符号帳２６の構
成図であり、以下、ＭＡ符号帳２６の動作をこの図にお
いて説明する。図１と同一の部分は同一番号を付す。Ｍ
Ａ符号帳２６は複数の副符号帳５４を持ち、副符号帳切
換手段５５は入力されたピッチ周期１５に応じて副符号
帳の一つを選択し、選択された副符号帳内のＭＡ符号語
２７を出力する。Example 5. FIG. 5 is a block diagram of the MA codebook 26 in one embodiment of the speech coding / decoding apparatus according to the present invention. The operation of the MA codebook 26 will be described below with reference to this figure. The same parts as those in FIG. 1 are denoted by the same reference numerals. M
The A codebook 26 has a plurality of subcodebooks 54, and the subcodebook switching means 55 selects one of the subcodebooks according to the input pitch period 15, and the MA code in the selected subcodebooks. Output word 27.

【００３９】例えば図において、副符号帳１は１５ｍｓ
のピッチ周期に対応するＭＡ符号帳を格納する。また副
符号帳２には１６ｍｓのピッチ周期に対応する符号帳を
格納する。また、副符号帳３には１７ｍｓのピッチ周期
に対応する符号帳を格納する。このようにして１５ｍｓ
から例えば２５ｍｓまでの符号帳を格納しておき、副符
号帳切り換え手段５５は入力されたピッチ周期１５に基
づき、副符号帳を選択できる。例えば、ピッチ周期１５
が１６ｍｓである場合には、副符号帳切り換え手段５５
は副符号帳２を選択しこれをＭＡ符号語２７として出力
する。なお復号化部のＭＡ符号帳３４もＭＡ符号帳２６
と同一の構成である。また、ＡＲ符号帳または音源モデ
ル符号帳または駆動音源符号帳を図５と同様の構成にす
ることも可能である。For example, in the figure, the sub codebook 1 is 15 ms.
The MA codebook corresponding to the pitch period of is stored. The sub-codebook 2 stores a codebook corresponding to a pitch period of 16 ms. The sub-codebook 3 stores a codebook corresponding to a pitch period of 17 ms. 15ms in this way
To the codebook of, for example, 25 ms, is stored, and the subcodebook switching means 55 can select the subcodebook based on the input pitch period 15. For example, pitch period 15
Is 16 ms, the sub codebook switching means 55
Selects sub-codebook 2 and outputs it as MA codeword 27. Note that the MA codebook 34 of the decoding unit is also the MA codebook 26.
It has the same configuration as. Further, the AR codebook, the excitation model codebook, or the driving excitation codebook can be configured in the same manner as in FIG.

【００４０】以上のように、この実施例が特徴とする点
は、ピッチ周期に応じて各符号帳内に複数の符号帳を用
意している点である。音声は例えば、男女の差、あるい
は会話のスピード、あるいは音声の高低等の特徴を有し
ているが、入力される音声が男であるか女であるか、あ
るいは速いか遅いか、あるいは高いか低いかというよう
な特徴はピッチ周期に反映されることが多い。すなわ
ち、ピッチ周期に応じてＭＡや音源モデルの対応が異な
るという事実がある。このピッチ周期に応じて、複数の
ＭＡ符号帳や音源モデル符号帳を用意しておき、ピッチ
周期に応じた符号帳を用いて量子化するのがこの実施例
の特徴である。As described above, the feature of this embodiment is that a plurality of codebooks are prepared in each codebook according to the pitch period. The voice has characteristics such as the difference between men and women, the speed of conversation, or the pitch of the voice, but whether the input voice is a man or a woman, or fast or slow, or high. Features such as low are often reflected in the pitch period. That is, there is a fact that the correspondence between the MA and the sound source model differs depending on the pitch period. The feature of this embodiment is that a plurality of MA codebooks and excitation model codebooks are prepared in accordance with the pitch cycle and quantization is performed using the codebook corresponding to the pitch cycle.

【００４１】実施例６．図６、図７はこの発明に係わる
音声符号化復号化装置の一実施例の構成図であり、以
下、本発明の動作をこの図において説明する。図６は符
号化部を示しており、図７は復号化部を示している。図
１、図２と同一の部分は同一番号を付し説明を省略す
る。符号化部では、当該フレームが有声音の先頭フレー
ムの場合、音源開始位置２０を当該フレームに先行する
無声フレームにおいて復号化部に伝送する。すなわち、
有音声の先頭フレームを符号化して伝送する前に音源開
始位置２０を符号化して伝送する。このため、有音声の
フレームの伝送が１フレーム分遅れることになるが、１
フレーム分の伝送の遅れがあっても復号化部での復号の
時刻がずれるだけであり、復号の品質に影響は少ない。
なお、音源開始位置２０の符号化を有声音の先頭フレー
ムの符号化とともに行ってもよい。ただし、この場合
は、有声音の先頭フレームの符号化情報量が音源開始位
置２０の符号化により減少する。Example 6. 6 and 7 are block diagrams of an embodiment of the speech coding / decoding apparatus according to the present invention, and the operation of the present invention will be described below with reference to these figures. FIG. 6 shows the encoding unit, and FIG. 7 shows the decoding unit. The same parts as those in FIGS. 1 and 2 are designated by the same reference numerals and the description thereof will be omitted. When the frame is the head frame of voiced sound, the encoding unit transmits the sound source start position 20 to the decoding unit in the unvoiced frame preceding the frame. That is,
The sound source start position 20 is coded and transmitted before the head frame of voiced speech is coded and transmitted. Therefore, the transmission of the voiced frame is delayed by one frame.
Even if there is a delay in the transmission of frames, the decoding time in the decoding unit only shifts, and the decoding quality is not affected.
The sound source start position 20 may be encoded together with the encoding of the head frame of voiced sound. However, in this case, the coded information amount of the head frame of the voiced sound is reduced by the coding of the sound source start position 20.

【００４２】復号化部では、第一の復号化手段３３が優
勢音の先頭フレームを復号することが予め定められてお
り、当該フレームが有声音の先頭フレームの場合、第一
の復号化手段４３は入力された音源開始位置２０から第
一の量子化音源信号３２を並べて復号音声３７を生成す
る。In the decoding section, it is predetermined that the first decoding means 33 decodes the leading frame of the dominant sound, and when the frame is the leading frame of the voiced sound, the first decoding means 43. Generates a decoded speech 37 by arranging the first quantized excitation signal 32 from the input excitation start position 20.

【００４３】この例を従来例で示した図１７を用いて説
明する。従来例においては、復号動作はフレームの先頭
から行われていたが、この実施例によれば、音源開始位
置２０から復号がスタートするため、図１７に示した時
刻Ｔ１から復号されることになる。従来は時刻Ｆ１から
復号されていたため時刻Ｆ１からＴ１の間の復号は本来
必要無いにも拘らず、合成音声が生じていたのに対し、
この実施例によれば、音源開始位置２０すなわち時刻Ｔ
１から復号が開始されるため、本来必要でない部分の合
成音声がなくなる。また、スタート復号開始位置が一致
するため、ピッチ周期のズレもなくなり、復号音声の品
質が向上する。図１７に示したように、入力音声１のピ
ッチ周期Ｐ１，Ｐ２，Ｐ３は復号音声３７のピッチ周期
Ｑ１，Ｑ２，Ｑ３とズレているため、このズレが復号音
声の品質を劣化させる原因となっている。これに対し、
この実施例によれば、復号音声も音源開始位置２０より
スタートするためピッチ周期は入力音声のものと同一に
なり、復号音声の品質を向上させるのに役立つ。This example will be described with reference to FIG. 17 showing a conventional example. In the conventional example, the decoding operation is performed from the beginning of the frame, but according to this embodiment, since the decoding starts from the sound source start position 20, it is decoded from time T1 shown in FIG. . Conventionally, since the decoding was performed from the time F1, the decoding between the time F1 and the time T1 was originally not necessary, but the synthesized voice was generated.
According to this embodiment, the sound source start position 20, that is, the time T
Since the decoding is started from 1, there is no synthesized voice in a portion that is not originally necessary. In addition, since the start decoding start positions coincide with each other, the pitch cycle deviation is eliminated, and the quality of decoded speech is improved. As shown in FIG. 17, since the pitch periods P1, P2, P3 of the input speech 1 are different from the pitch periods Q1, Q2, Q3 of the decoded speech 37, this deviation causes deterioration of the quality of the decoded speech. ing. In contrast,
According to this embodiment, since the decoded speech also starts from the sound source start position 20, the pitch period becomes the same as that of the input speech, which is useful for improving the quality of the decoded speech.

【００４４】実施例７．上記実施例６においては、音源
開始位置２０を第一の復号化手段３３に入力する場合を
示しているが、音源開始位置２０を第二の復号化手段４
３に入力するようにしてもかまわない。実施例６の場合
は前述したように第一の符号化手段が有声音の先頭フレ
ームを複合するということが前提となっているため、音
源開始位置２０を第一の復号化手段３３にのみ入力すれ
ば良かったが、もし、有声音の先頭フレームを、第一と
第二の符号化手段のいずれかどちらが符号化するか不明
な場合には、音源開始位置２０を第一と第二の復号化手
段の両方に入力させてやることにより、第一と第二の復
号化手段のいずれかがその音源開始位置２０を用いて復
号を開始することが可能になる。Example 7. Although the sound source start position 20 is input to the first decoding unit 33 in the sixth embodiment, the sound source start position 20 is input to the second decoding unit 4.
You may enter it in 3. In the case of the sixth embodiment, as described above, it is premised that the first encoding unit combines the head frame of the voiced sound, so the sound source start position 20 is input only to the first decoding unit 33. However, if it is unclear which of the first and second encoding means encodes the head frame of the voiced sound, the sound source start position 20 is set to the first and second decoding. By inputting to both of the decoding means, either of the first and second decoding means can start decoding using the sound source start position 20.

【００４５】実施例８．実施例１ないし実施例７では、
符号化手段選択手段において第一の符号探索手段が選択
された場合、適応音源符号帳に第一の量子化音源信号を
格納するが、第一の量子化音源信号でＭＡフィルタを駆
動した信号を格納することも可能である。Example 8. In Examples 1 to 7,
When the first code searching means is selected in the encoding means selecting means, the first quantized excitation signal is stored in the adaptive excitation codebook, but the signal obtained by driving the MA filter with the first quantized excitation signal is stored. It is also possible to store.

【００４６】実施例９．実施例１ないし実施例８ではス
ペクトルパラメータとしてＡＲとＭＡを用いているが、
ＡＲのみ、ケプストラム等他のスペクトルパラメータを
用いることも可能である。Example 9. Although AR and MA are used as the spectral parameters in the first to eighth embodiments,
It is also possible to use other spectral parameters such as AR only, cepstrum, etc.

【００４７】実施例１０．上記実施例５においては、図
１または図２に示した音声符号化復号化装置に対して用
いられる各符号帳に複数の符号帳を用意する場合を示し
たが、これら複数の符号帳をピッチ周期で切り換える方
式は、図８及び図９に示した従来の音声符号化復号化装
置に対しても用いることができる。すなわち、ピッチ周
期１５に応じて符号帳を選択するという方式は実施例１
に示した方式あるいは従来の方式に拘らず、他の方式に
対しても適用することが可能である。Example 10. In the fifth embodiment, a case has been shown in which a plurality of codebooks are prepared for each codebook used for the speech coding / decoding apparatus shown in FIG. 1 or FIG. The method of switching in cycles can also be used for the conventional speech coding / decoding apparatus shown in FIGS. 8 and 9. That is, the method of selecting the codebook according to the pitch period 15 is the first embodiment.
The method can be applied to other methods regardless of the method shown in FIG.

【００４８】実施例１１．上記実施例においては、音声
符号化復号化装置として音声が符号され、かつ復号化さ
れる装置の場合について説明したが、単に符号化のみを
行う符号化装置、あるいは復号化のみを行う復号化装置
においても、それぞれの実施例の符号化部分及び復号化
部分を適用することが可能である。Example 11. In the above embodiment, the case where the voice is encoded and decoded as the voice encoding / decoding device has been described. However, the encoding device performing only the encoding or the decoding device performing only the decoding is described. Also in, it is possible to apply the encoding part and the decoding part of each embodiment.

【００４９】実施例１２．また上記実施例においては、
音声を符号化復号化する場合について説明したが、この
発明における音声とは人間が声道から発声する音声に限
らず動物や獣等の人間以外の生物が発声する声について
も適用することが可能である。同様に生物が発声する声
に限らず音として入力されるものであれば、これらの音
を符号化復号化する場合にも適用されることが可能であ
る。例えば、楽器の音や摩擦音等の音を入力して符号
化、復号化する場合でもかまわない。また、音は人間が
知覚出来る場合に限らず人間の耳には感知できない超音
波あるいは低音波等の音であってもかまわない。Example 12 In the above embodiment,
Although the case of encoding and decoding a voice has been described, the voice in the present invention is not limited to a voice uttered by a human from the vocal tract and can be applied to a voice uttered by a non-human organism such as an animal or a beast. Is. Similarly, as long as it is input as sounds, not limited to voices produced by living things, it can be applied to the case of encoding and decoding these sounds. For example, a sound of a musical instrument, a fricative sound, or the like may be input and encoded and decoded. Further, the sound is not limited to being perceptible by a human being, and may be a sound such as an ultrasonic wave or a low sound wave that cannot be perceived by a human ear.

【００５０】評価実験例．図８にサブフレーム数が２の
場合の本評価実験の構成図を示す。図においてＭＯＤＥ
０はＦＶＱ−ＧＡＲＭＡ方式を意味し、ＭＯＤＥ１はＣ
ＥＬＰ方式を意味するものとする。まずＬＳＰ（ＡＲパ
ラメータ）をフレームに数組予備選択する。次にそれぞ
れのＬＳＰに対し、ＳＮＲｓｅｇの良いＭｏｄｅをサブ
フレーム毎に選択する。最終的にフレーム全体のＳＮＲ
ｓｅｇを最大にするＬＳＰと各サブフレームでのＭＯＤ
Ｅの組み合わせが選択される。Example of evaluation experiment. FIG. 8 shows a configuration diagram of this evaluation experiment when the number of subframes is 2. MODE in the figure
0 means FVQ-GARMA system, MODE1 is C
It means ELP method. First, several sets of LSPs (AR parameters) are preselected for a frame. Next, for each LSP, a Mode with good SNRseg is selected for each subframe. Finally the SNR of the whole frame
LSP that maximizes seg and MOD in each subframe
A combination of E is selected.

【００５１】図９にＭＯＤＥ０の内部構成図を示す。Ｍ
ＯＤＥ０はＦＶＱ−ＧＡＲＭＡと同様で、予備選択され
た声帯音源波モデルコードとＭＡコードの全ての組み合
わせの中からＳＮＲｓｅｇを最大にするものを選択す
る。FIG. 9 shows the internal structure of MODE0. M
ODE0 is similar to FVQ-GARMA, and selects one that maximizes SNRseg from all combinations of preselected vocal cord source wave model code and MA code.

【００５２】図１０にＭＯＤＥ１の内部構成図を示す。
ＭＯＤＥ１のＣＥＬＰはＭＯＤＥ０と同期をとるために
ピッチ同期の処理とした。まず、適応コードブックの１
ピッチ長を繰り返しベクトルＰとする。なお、前サブフ
レームがＭＯＤＥ０であった場合は声帯音源波モデルで
ＭＡフィルタを駆動したものが適応コードブックとな
る。次に駆動音源コードブックの１ピッチ長を繰り返し
ベクトルＣとする。そしてＳＮＲｓｅｇを最大にするベ
クトルＰとＣのゲインの比を決定する。なお駆動音源コ
ードブックの学習には、短周期予測残差信号を用いた。
無声フレームではサブフレーム長が長いためサブフレー
ムを複数に分割し、白色雑音で駆動する。なお、適応コ
ードブックは使用しない。コードブックの学習には男女
各５名が発声した日本語短文２０文章を用いた。ただし
駆動音源コードブックの学習には、１０文章を用いた。
評価用データには学習外の日本語短文１０文章（学習に
用いなかった男女各５名が異なる１文章ずつを発声）を
用いた。FIG. 10 shows an internal configuration diagram of the MODE1.
In CELP of MODE1, pitch synchronization processing is performed in order to synchronize with MODE0. First, the adaptive codebook 1
Let the pitch length be a repetition vector P. If the previous subframe is MODE0, the adaptive codebook is a vocal cord source wave model driven by an MA filter. Next, one pitch length of the driving sound source codebook is set as a repeating vector C. Then, the ratio of the gains of the vectors P and C that maximizes SNRseg is determined. The short-term prediction residual signal was used for learning the driving sound source codebook.
Since the subframe length is long in the unvoiced frame, the subframe is divided into a plurality of parts and driven by white noise. No adaptive codebook is used. Twenty Japanese short sentences spoken by five men and women were used for learning the codebook. However, 10 sentences were used for learning the driving sound source codebook.
For the evaluation data, 10 sentences of Japanese short sentences other than the learning were used (5 sentences for each male and female, who were not used for learning, uttered one different sentence).

【００５３】本方式の性能を調べるために図１１の条件
で合成音を作成した。ただし、音源ピーク位置は自動抽
出した。図１２に本方式２．４Ｋｂｐｓ（ＭＧＡＲＭ
Ａ）のＳＮＲｓｅｇとＤＣ（ＣｅｐｓｔｒｕｍＤｉｓ
ｔｏｒｔｉｏｎ）を男女別に示す。この結果をみると男
性の方がＳＮＲｓｅｇとＣＤともに劣っている。これ
は、ピッチ変動が大きい男性においてＭＯＤＥ１が多く
選択され、ＭＯＤＥ１においてピッチ長の補間を行わず
同一ピッチ長による音源の繰り返しを行っているために
音源ピーク位置のずれが大きくなることが主な原因だと
考えられる。In order to investigate the performance of this system, a synthetic sound was created under the conditions shown in FIG. However, the peak position of the sound source was automatically extracted. In Fig. 12, the present system 2.4 Kbps (MGARM
A) SNRseg and DC (Cepstrum Dis)
tortion) by gender. The results show that males are inferior in both SNRseg and CD. This is mainly due to the fact that MODE1 is often selected in men with large pitch fluctuations, and because the sound source is repeated with the same pitch length without interpolating the pitch length in MODE1, the deviation of the sound source peak position becomes large. Considered to be

【００５４】本方式の合成音声の主観品質を調べるため
に、図１１に示す条件で合成音声を作成し、被験者６名
による簡単な対比較試験を行った。音源ピーク位置を自
動抽出した本方式２．４Ｋｂｐｓ（ＭＧＡＲＭＡ）と音
源ピーク位置にマニュアルで修正を加えた従来の２．４
ＫｂｐｓＦＶＱ−ＧＡＲＭＡ（ＭＦＶＱ）、音源ピーク
位置を自動抽出した従来の２．４ＫｂｐｓＦＶＱ−ＧＡ
ＲＭＡ（ＡＦＶＱ）及び４．８ＫｂｐｓＣＥＬＰ基本方
式（ＣＥＬＰ）との比較を行った。試験結果を図１３に
示す。In order to examine the subjective quality of the synthetic speech of this method, synthetic speech was prepared under the conditions shown in FIG. 11 and a simple pair-comparison test was conducted by 6 test subjects. This system 2.4Kbps (MGARMA) that automatically extracts the sound source peak position and the conventional 2.4 that manually corrects the sound source peak position
Kbps FVQ-GARMA (MFVQ), the conventional 2.4 Kbps FVQ-GA that automatically extracts the sound source peak position
A comparison was made with RMA (AFVQ) and 4.8 Kbps CELP Basic Method (CELP). The test results are shown in FIG.

【００５５】本方式はＡＦＶＱより良好であり、音源ピ
ーク位置の抽出誤りに対するロバスト性の向上が確認さ
れた。本方式とＭＦＶＱの比較において本方式が選択さ
れない文章では、部分的に残響感が感じられた。音質の
ばらつきにおいては、ＭＦＶＱより今回方式の方が安定
していることを確認している。音源ピーク位置抽出や補
間の改良によりＭＦＶＱと同等レベルに達すれば４．８
ＫＣＥＬＰと同等の品質が得られることが期待できる。This method is better than AFVQ, and it was confirmed that the robustness against the extraction error of the sound source peak position was improved. In the comparison between this method and MFVQ, a reverberation was partially felt in sentences where this method was not selected. It has been confirmed that the present method is more stable than the MFVQ in terms of variations in sound quality. 4.8 if it reaches the same level as MFVQ by improving the sound source peak position extraction and interpolation
It can be expected that the same quality as KCELP can be obtained.

【００５６】[0056]

【発明の効果】以上説明したように請求項１記載の発明
によれば、２種類の符号化方式の中からよりよい方式を
選択するので、１種類の符号化方式の場合よりもよりよ
い符号化が行える。As described above, according to the first aspect of the present invention, a better method is selected from the two kinds of encoding methods, and therefore, a better code than the case of one kind of encoding method is selected. Can be converted.

【００５７】また、請求項２記載の発明では符号化手段
選択手段により第二の符号探索手段と第一の符号探索手
段のうち合成音声と入力音声の歪を小さくする方を選択
するため、音源モデルの適合の悪い話者において、復号
音声の品質が改善する。According to the second aspect of the present invention, the encoding means selecting means selects the one of the second code searching means and the first code searching means that reduces the distortion of the synthesized speech and the input speech. The quality of the decoded speech is improved for speakers with poor model fit.

【００５８】請求項３記載の発明の適応音源符号帳は適
応音源信号として第一の量子化音源信号と音源信号を格
納し、第二の符号探索手段が合成音声と入力音声の歪を
小さくする方を選択して用いるため、復号音声の品質が
改善する。The adaptive excitation codebook according to the third aspect of the present invention stores the first quantized excitation signal and the excitation signal as the adaptive excitation signal, and the second code searching means reduces distortion of the synthesized speech and the input speech. Since one of them is selected and used, the quality of decoded speech is improved.

【００５９】また、請求項４，５記載の発明の音声符号
化復号化装置では、符号化部と復号化部のスペクトル符
号帳、音源モデル符号帳、駆動音源符号帳がピッチ周期
に対応して作成された複数の副符号帳を持ち、この副符
号帳をピッチ周期により切り換えて用いるため、復号音
声の品質が改善する。In the speech coding / decoding apparatus according to the present invention, the spectrum codebook, the excitation model codebook, and the driving excitation codebook of the coding unit and the decoding unit correspond to the pitch period. Since a plurality of created sub-codebooks are provided and these sub-codebooks are used by switching according to the pitch period, the quality of decoded speech is improved.

【００６０】また、請求項６，７記載の発明の音声符号
化復号化装置では、有声音の先頭のフレームでの音源開
始位置を有声フレームに先行する無声フレームで復号化
部に出力し、復号化部の第一の復号化手段が第一の量子
化音源信号を音源開始位置に同期して復号音声を生成す
るため、有声フレームの伝送量を増加させずに符号化部
の合成音声と同一の復号音声を生成し、復号音声の品質
が改善する。Further, in the speech coding / decoding apparatus according to the sixth and seventh aspects of the invention, the sound source start position in the leading frame of the voiced sound is output to the decoding unit as an unvoiced frame preceding the voiced frame, and decoding is performed. Since the first decoding means of the encoding unit generates the decoded voice by synchronizing the first quantized excitation signal with the excitation start position, it is the same as the synthesized voice of the encoding unit without increasing the transmission amount of voiced frames. To generate the decoded sound of and the quality of the decoded sound is improved.

[Brief description of drawings]

【図１】この発明の実施例１の音声符号化復号化装置を
示す構成図である。FIG. 1 is a configuration diagram showing a speech coding / decoding device according to a first embodiment of the present invention.

【図２】この発明の実施例１の音声符号化復号化装置を
示す構成図である。FIG. 2 is a configuration diagram showing a speech encoding / decoding device according to the first embodiment of the present invention.

【図３】この発明の実施例１の適応音源符号帳を示す構
成図である。FIG. 3 is a configuration diagram showing an adaptive excitation codebook according to the first embodiment of the present invention.

【図４】この発明の実施例３の適応音源符号帳を示す構
成図である。FIG. 4 is a configuration diagram showing an adaptive excitation codebook according to a third embodiment of the present invention.

【図５】この発明の実施例５のＭＡ符号帳を示す構成図
である。FIG. 5 is a configuration diagram showing an MA codebook according to a fifth embodiment of the present invention.

【図６】この発明の実施例６の音声符号化復号化装置を
示す構成図である。[Fig. 6] Fig. 6 is a configuration diagram showing a speech encoding / decoding device of embodiment 6 of the present invention.

【図７】この発明の実施例６の音声符号化復号化装置を
示す構成図である。FIG. 7 is a configuration diagram showing a speech encoding / decoding device according to embodiment 6 of the present invention.

【図８】この発明に基づく評価実験の音声符号化復号化
装置を示す構成図である。FIG. 8 is a block diagram showing a speech encoding / decoding device of an evaluation experiment based on the present invention.

【図９】この発明に基づく評価実験の音声符号化復号化
装置を示す構成図である。FIG. 9 is a configuration diagram showing a speech encoding / decoding device of an evaluation experiment based on the present invention.

【図１０】この発明に基づく評価実験の音声符号化復号
化装置を示す構成図である。FIG. 10 is a configuration diagram showing a speech encoding / decoding device of an evaluation experiment based on the present invention.

【図１１】この発明に基づく評価実験の条件を示す図で
ある。FIG. 11 is a diagram showing conditions of an evaluation experiment based on the present invention.

【図１２】この発明に基づく評価実験のＳＮＲｓｅｇと
ＣＤを示す図である。FIG. 12 is a diagram showing SNRseg and CD in an evaluation experiment based on the present invention.

【図１３】この発明に基づく評価実験の結果を示す図で
ある。FIG. 13 is a diagram showing a result of an evaluation experiment based on the present invention.

【図１４】従来の音声符号化復号化装置を示す構成図で
ある。FIG. 14 is a configuration diagram showing a conventional speech encoding / decoding device.

【図１５】従来の音声符号化復号化装置を示す構成図で
ある。FIG. 15 is a configuration diagram showing a conventional speech encoding / decoding device.

【図１６】従来の音声符号化復号化装置の符号探索手段
の動作を説明する図である。[Fig. 16] Fig. 16 is a diagram for explaining the operation of the code search means of the conventional speech encoding / decoding device.

【図１７】従来の音声符号化復号化装置の復号化手段の
動作を説明する図である。FIG. 17 is a diagram for explaining the operation of the decoding means of the conventional speech encoding / decoding device.

【図１８】従来の音声符号化復号化装置の問題点を説明
する図である。[Fig. 18] Fig. 18 is a diagram explaining a problem of a conventional speech encoding / decoding device.

[Explanation of symbols]

１入力音声２駆動音源符号帳３駆動音源符号語４駆動音源生成手段５駆動音源信号６適応音源符号帳７適応音源信号８第二の符号探索手段９第一の量子化音源信号１０第二の量子化音源信号１１ＡＲ符号帳１２ＡＲ符号語１３第二の符号探索結果１４ピッチ周期抽出手段１５ピッチ周期１６符号化手段選択手段１７符号化手段選択信号１８符号化結果１９音源開始位置抽出手段２０音源開始位置２１音源モデル符号帳２２音源モデル符号語２３音源モデル生成手段２４第一の量子化音源信号２５第一の符号探索手段２６ＭＡ符号帳２７ＭＡ符号語２８第一の符号探索結果２９音源モデル符号帳３０音源モデル符号語３１音源モデル生成手段３２第一の量子化音源信号３３第一の復号化手段３４ＭＡ符号帳３５ＭＡ符号語３６第一の量子化音源信号３７復号音声３８復号化手段選択手段３９符号化結果４０適応音源符号帳４１適応音源信号４２第二の量子化音源信号４３第二の復号化手段４４ＡＲ符号帳４５ＡＲ符号語４６駆動音源符号帳４７駆動音源符号語４８駆動音源生成手段４９駆動音源信号５０音源記憶手段５１第一の音源記憶手段５２第二の音源記憶手段５３切換手段５４副符号帳５５副符号帳切換手段５６符号化結果５７符号探索手段５８復号化手段５９量子化音源信号６０量子化音源信号 1 Input Speech 2 Drive Excitation Codebook 3 Drive Excitation Codeword 4 Drive Excitation Generator 5 Drive Excitation Signal 6 Adaptive Excitation Codebook 7 Adaptive Excitation Signal 8 Second Code Search Means 9 First Quantized Excitation Signal 10 Second Quantized excitation signal 11 AR codebook 12 AR codeword 13 Second code search result 14 Pitch cycle extraction means 15 Pitch cycle 16 Coding means selection means 17 Coding means selection signal 18 Coding result 19 Excitation start position extraction means 20 Excitation start position 21 Excitation model codebook 22 Excitation model codeword 23 Excitation model generation means 24 First quantized excitation signal 25 First code search means 26 MA codebook 27 MA codeword 28 First code search result 29 Excitation Model codebook 30 Excitation model codeword 31 Excitation model generation means 32 First quantized excitation signal 33 First decoding means 34 MA codebook 35 MA codeword 36 First quantized excitation signal 37 Decoded speech 38 Decoding means selection means 39 Coding result 40 Adaptive excitation codebook 41 Adaptive excitation signal 42 Second quantized excitation signal 43 Second decoding Conversion means 44 AR codebook 45 AR codeword 46 driving excitation codebook 47 driving excitation codeword 48 driving excitation generation means 49 driving excitation signal 50 excitation source storage means 51 first excitation storage means 52 second excitation storage means 53 switching means 54 Sub-codebook 55 Sub-codebook switching means 56 Encoding result 57 Code search means 58 Decoding means 59 Quantized excitation signal 60 Quantized excitation signal

Claims

[Claims]

1. A pitch cycle for extracting a pitch cycle from an input signal in a coding apparatus for searching a model that minimizes distortion of an input signal and a synthesized signal and coding the input signal using the search result. Extracting means; first encoding means for encoding one input signal by selecting one model suitable for the input signal from a plurality of models for the pitch period extracted by the pitch period extracting means; A second encoding unit that encodes an input signal by a method different from that of the first encoding unit is compared with the encoding results of the first encoding unit and the second encoding unit, and the comparison is made. An encoding apparatus comprising: a selection unit that selects and outputs one of the encoding results of the first encoding unit and the second encoding unit based on the result.

2. A speech coding / decoding apparatus which separates an input speech into spectral parameters of a vocal tract and a sound source signal and performs coding / decoding for each frame of a fixed time length, wherein a pitch is provided from the input speech to a coding unit. Pitch cycle extraction means for extracting a cycle and outputting it to the sound source model generation means and the driving sound source generation means, a spectrum codebook in which a plurality of typical spectrum parameters are stored as spectrum codewords, and a sound source representing a sound source signal of one pitch cycle Excitation model codebook that stores a plurality of typical model parameters as excitation model codewords, and excitation signal of one pitch cycle generated from excitation model codewords in the excitation model codebook is repeated in the pitch cycle. And a first quantized excitation signal as a first quantized excitation signal to the first code search means. And a search for a combination of a sound source model code word and a spectrum code word that minimizes distortion of the synthesized speech and the input speech generated from the spectrum code word in the spectrum codebook, and the search result as the first code search result at that time. The first code search means that outputs the first quantized excitation signal in this combination to the adaptive excitation codebook and the first code search means in the preceding frame. The adaptive excitation codebook in which the first quantized excitation signal or the second quantized excitation signal output by the second code search means is stored as an adaptive excitation signal, and a plurality of signals prepared in advance as driving excitation codewords Driving excitation codebooks each of which is stored, and driving excitation generation means for generating a driving excitation signal in which the driving excitation codewords in the driving excitation codebook are repeated at the pitch cycle. A second quantized excitation signal is generated from the adaptive excitation signal and the driving excitation signal, and a spectrum codeword that minimizes spectral distortion in the frame is selected from the spectrum codebook. Generates synthesized speech using the second quantized excitation signal, searches for a driving excitation codeword that minimizes distortion between the synthesized speech and the input speech, and codes the search result as the second code search result along with the distortion at that time. Output to the conversion means selection means, the second code search means for outputting the second quantized excitation signal to the adaptive excitation codebook, among the first code search result and the second code search result, Encoding means selecting means for selecting and outputting one having a smaller distortion as an encoding result of the frame and outputting which code search result is selected as an encoding means selecting signal. In the decoding unit, the same spectrum codebook as the coding unit, the same excitation model codebook as the coding unit, and the excitation model in the excitation model codebook corresponding to the coding result input from the coding unit. An excitation model generation unit that is the same as the encoding unit that generates and outputs the first quantized excitation signal from the codeword and the pitch period input from the encoding unit, the same adaptive excitation codebook as the encoding unit, and the encoding The same driving excitation codebook as the unit, generating decoded speech using the first quantized excitation signal and the spectrum codeword in the spectrum codebook corresponding to the coding result input from the coding unit, First decoding means for outputting the first quantized excitation signal to an adaptive excitation codebook, a driving excitation codeword in the driving excitation codebook corresponding to the encoding result, and a driving excitation signal from the pitch period Drive sound source generation A second quantized excitation signal generated from the adaptive excitation signal in the adaptive excitation codebook, the driving excitation signal output from the driving excitation generation means, and the decoded speech from the spectrum codeword; The second decoding means for outputting the quantized excitation signal to the adaptive excitation codebook and the decoded speech of the first decoding means and the second decoding means according to the coding means selection signal input from the coding section. A speech coding / decoding apparatus comprising: a decoding means selecting means for selecting decoded speech.

3. In the above speech coding / decoding apparatus, a first quantized excitation signal when the first code searching means is selected is stored in the adaptive excitation codebooks of the encoding unit and the decoding unit. First excitation storage means, second excitation storage means for storing the second quantized excitation signal when the second code search means is selected, and first quantized excitation signal as an adaptive excitation signal And a switching means for switching and outputting the second quantized excitation signal, and the encoding section generates a second quantized excitation signal from the adaptive excitation signal and the driving excitation signal, and the second quantized excitation signal. A synthetic speech is generated using the signal and the AR code word, an adaptive sound source signal that reduces distortion of the synthetic speech and the input speech is selected, and the selection result is included in the second code search result and output. It is equipped with a second code search means, and enters the decoding unit from the coding unit. A second decoding means adapted to select and use the first quantized excitation signal or the second quantized excitation signal in the adaptive excitation codebook as the adaptive excitation signal according to the applied encoding result. The speech coding / decoding apparatus according to claim 2, characterized in that:

4. In a speech coding / decoding apparatus which separates an input speech into spectral parameters of a vocal tract and a sound source signal and performs coding / decoding for each frame of a fixed time length, at least a typical coding unit is used. A spectrum codebook in which a plurality of spectrum parameters are stored as spectrum codewords, and a excitation model codebook in which a plurality of typical parameters of excitation models representing excitation signals of one pitch period are stored as excitation model codewords, Any one of a driving excitation codebook in which a plurality of prepared signals are stored as driving excitation codewords, and at least one of a spectrum codebook, an excitation model codebook, or a driving excitation codebook provided in the encoding unit. , A plurality of sub-codebooks corresponding to the pitch cycle and an encoding / decoding process according to the pitch cycle input from the pitch cycle extraction means. A sub-codebook switching unit for switching the sub-codebook to be used is provided, and the decoding unit has the same spectrum codebook or excitation model code as the coding unit for switching the sub-codebook according to the pitch period input from the decoding unit. A voice encoding / decoding device comprising a book or a driving excitation codebook.

5. A speech coder / decoder that separates an input speech into spectrum parameters and excitation signals and encodes each frame with a fixed time length, wherein an encoding unit extracts a pitch period from the input speech and the excitation source. Model generation means, driving excitation generation means, spectrum codebook, pitch period extraction means for outputting to the excitation model codebook, spectrum codebook storing a plurality of typical spectrum parameters as spectrum codewords, and excitation source with one pitch period An excitation model codebook storing a plurality of typical excitation model parameters representing signals as excitation model codewords, and a plurality of subcodes corresponding to pitch periods in the spectrum codebook or the excitation model codebook And the sub-codebook used for encoding / decoding processing according to the pitch cycle input from the pitch cycle extraction means. And a sub-codebook switching means for outputting the excitation signal of one pitch cycle generated from the excitation model codeword in the excitation model codebook repeated in the pitch cycle to the code search means as a quantized excitation signal. Model generation means, searching for a combination of the excitation model codeword and the spectrum codeword that minimizes distortion of the synthesized speech and the input speech generated from the quantized excitation signal and the spectrum codeword in the spectrum codebook,
Code search means for outputting the search result as a code search result, a decoding unit, the same spectrum codebook as the coding unit, the same excitation model codebook as the coding unit, and the coding input from the coding unit. An excitation model generation unit that is the same as the encoding unit that generates a quantized excitation signal from the excitation model codeword in the excitation model codebook corresponding to the result and the pitch period input from the encoding unit, and outputs the quantization excitation signal to the decoding unit; , A decoded speech is generated using a spectrum codeword in the spectrum codebook corresponding to the quantized excitation signal and a coding result input from a coding unit, and the quantized excitation signal is stored in the adaptive excitation codebook. A voice encoding / decoding device comprising a decoding means for outputting.

6. In the speech coding / decoding apparatus, when the frame is a frame in which the unvoiced frame is changed to a voiced frame for the first time, the coding unit extracts from the input speech the position at which the voiced sound starts as the sound source start position. Then, in the unvoiced frame preceding the frame, the excitation start position extraction means for outputting to the decoding unit, and the decoding unit synchronizes the first quantized excitation signal with the excitation start position input from the encoding unit. The speech coding / decoding apparatus according to claim 2, further comprising a first decoding unit configured to generate decoded speech.

7. A speech coder / decoder that separates an input speech into spectrum parameters and excitation signals and encodes each frame of a fixed time length, wherein an encoding unit extracts a pitch period from the input speech and generates an excitation. Pitch cycle extraction means for outputting to the model generation means and the driving excitation generation means, a spectrum codebook in which a plurality of typical spectrum parameters are stored as spectrum codewords, and a typical excitation model parameter representing a one pitch cycle excitation signal Excitation model codebook in which a plurality of typical excitation model codewords are stored, and one pitch period excitation signal generated from the excitation model codeword in the excitation model codebook is quantized by repeating the excitation signal in the pitch period. Excitation model generating means for outputting to the code search means as an excitation signal, the quantized excitation signal and the spectrum in the spectrum codebook. A code search means for searching a combination of a sound source model code word and a spectrum code word that minimizes distortion of synthesized speech and input speech generated from a Koutor code word, and outputting the search result as a code search result, and the frame is unvoiced. In the case of a frame that has changed from a frame to a voiced frame for the first time, a sound source start position extraction unit that extracts the position where the voiced sound starts as a sound source start position from the input voice, and outputs it to the decoding unit in the unvoiced frame preceding the frame, In the decoding unit, the same spectrum codebook as the encoding unit, the same excitation model codebook as the encoding unit, and the excitation model codeword in the excitation model codebook corresponding to the encoding result input from the encoding unit. And the same excitation model generation unit as the encoding unit that generates a quantized excitation signal from the pitch period input from the encoding unit and outputs it to the decoding unit If the frame is a frame that has changed from a voiceless frame to a voiced frame for the first time, the quantized excitation signal synchronized with the excitation start position input from the encoding unit and the encoding result input from the encoding unit A speech coding / decoding apparatus comprising: decoding means for generating decoded speech using a spectrum codeword in a spectrum codebook and outputting the quantized excitation signal to the adaptive excitation codebook.