EP0308817A2

EP0308817A2 - Method for converting channel vocoder parameters into LPC vocoder parameters

Info

Publication number: EP0308817A2
Application number: EP88115139A
Authority: EP
Inventors: Hans Dipl.-Ing. Brandl
Original assignee: Siemens AG; Siemens Corp
Current assignee: Siemens AG; Siemens Corp
Priority date: 1987-09-23
Filing date: 1988-09-15
Publication date: 1989-03-29
Also published as: EP0308817A3; DE3732047A1; DE3732047C2

Abstract

The invention relates to a method for converting channel vocoder parameters into LPC vocoder parameters, the LPC vocoder parameters being calculated from the smoothed power spectrum of the channel vocoder parameters by an inverse discrete Fourier transformation. According to the invention, for a prescribed channel number of the channel vocoder and a prescribed parameter number of the LPC vocoder, matrix elements are calculated from the variables, which are constant in this case, and the LPC vocoder parameters are computed from the channel vocoder parameters by matrix multiplications. <IMAGE>

Description

Die Erfindung betrifft ein Verfahren gemäß dem Oberbegriff des Patentanspruchs 1.The invention relates to a method according to the preamble of patent claim 1.

Derzeit werden digitale Schmalband-Kommunikationsnetze mit niedrigen Datenübertragungsraten (1-2 kbit/s) geplant. Die hierbei angewandten Codierungsverfahren bauen entweder auf dem Prinzip des Kanalvocoders oder der linearen Prädiktion (LPC-Vocoder) auf. Eine Kommunikation zwischen den Vocodern ist nur möglich, falls an ihrer Schnittstelle eine geeignete Datenumcodierung erfolgt.Digital narrowband communication networks with low data transmission rates (1-2 kbit / s) are currently being planned. The coding methods used are based either on the principle of the channel vocoder or the linear prediction (LPC vocoder). Communication between the vocoders is only possible if a suitable data transcoding takes place at their interface.

Der hierzu benötigte Umsetzer soll möglichst aufwandgünstig gestaltet sein und die Sprachqualität möglichst nicht verschlechtern.The converter required for this should be designed to be as inexpensive as possible and should not deteriorate the speech quality if possible.

Eine Möglichkeit, einen Umsetzer aufzubauen, besteht in der Rücktransformation der Sprachdaten in das Sprachsignal und dessen Neucodierung.One way to build a converter is to transform the speech data back into the speech signal and re-encode it.

Dieses Verfahren ist sehr aufwendig, da zwei Analyseeinheiten und zwei Syntheseeinheiten benötigt werden. Durch Analysefehler verschlechtert sich außerdem die Sprachqualität. Die Verschlechterung der Sprachqualität läßt sich durch direkte Umcodierung der Daten der verschiedenen Vocoder umgehen. Diese Möglichkeit ergibt sich aus dem sehr ähnlichen Syntheseprinzip, das bei dem Kanalvocoder und dem LPC-Vocoder angewandt wird.This method is very complex since two analysis units and two synthesis units are required. The analysis quality also deteriorates the speech quality. The deterioration of the speech quality can be avoided by directly re-encoding the data of the different vocoders. This possibility results from the very similar synthesis principle, that of the channel vocoder and the LPC vocoder is applied.

Das Sprachsignal wird hierbei durch ein Anregungssignal, welches durch ein variables Filter gefiltert wird, erzeugt. Das Anregungssignal besteht bei stimmhaften Lauten aus einer Pulsfolge und bei stimmlosen Lauten aus weißem Rauschen. Mit den Anregungsparametern wird die Pulsfrequenz und der Anregungsmodus - stimmhaft oder stimmlos - festgelegt. Das variable Übertragungsverhalten des Filters entspricht dem variablen Resonanzverhalten des menschlichen Vokaltraktes. Dieses ändert sich langsam und wird durch Filterparameter alle 10 bis 20 ms neu eingestellt. Aufgabe der Sprachsignal-Analyse eines Vocoders ist es, aus einem Sprachsignal die Anregungsparameter und die Filterparameter zu gewinnen. Der LPC-Vocoder und der Kanalvocoder unterscheiden sich im wesentlichen in der Struktur des Filters. LPC geht von einem Allpolfilter und der Kanalvocoder von einer Filterbank aus. Damit unterscheiden sich die Analyseverfahren zur Bestimmung der entsprechenden Filterparameter und es ergeben sich andere Filterparameter, die in den verschiedenen Netzen übertragen werden. Dagegen sind die Anregungsparameter im Prinzip die gleichen.The speech signal is generated by an excitation signal that is filtered by a variable filter. The excitation signal consists of a pulse train for voiced sounds and white noise for unvoiced sounds. With the excitation parameters, the pulse frequency and the excitation mode - voiced or unvoiced - are determined. The variable transmission behavior of the filter corresponds to the variable resonance behavior of the human vocal tract. This changes slowly and is reset by filter parameters every 10 to 20 ms. The task of the speech signal analysis of a vocoder is to obtain the excitation parameters and the filter parameters from a speech signal. The LPC vocoder and the channel vocoder differ essentially in the structure of the filter. LPC assumes an all-pole filter and the channel vocoder assumes a filter bank. The analysis methods for determining the corresponding filter parameters differ and there are other filter parameters that are transmitted in the different networks. In contrast, the excitation parameters are basically the same.

Es wird also ein Umcodierverfahren gesucht, welches die Filterparameter einer Filterbank eines Kanalvocoders in die Filterparameter eines Allpolfilters eines LPC-Vocoders umwandelt.A recoding process is therefore sought which converts the filter parameters of a filter bank of a channel vocoder into the filter parameters of an all-pole filter of an LPC vocoder.

Die Kanalvocoder-Parameter (oder Koeffizienten) stellen nachrichtentheoretisch meist ein nicht-äquidistant abgetastetes Spektrum dar. Aus dem Amplitudenspektrum wird nun das Leistungsspektrum berechnet und mit Hilfe der Fouriertransformation in die Autokorrelationsfunktion (AKF) transformiert. Aus der AKF kann nun in bekannter Weise mit Hilfe der üblichen Verfahren (z.B. Levinson-Rekursion) der entsprechende LPC-Vocoder-Parametersatz berechnet werden (siehe H. Hermansky, B. Hanson, H. Witka; "Perceptually based Predictive Analysis of Speech" on ICASSP 85, S. 13.10 Tagungsband).The channel vocoder parameters (or coefficients) usually represent a non-equidistantly scanned spectrum in terms of message theory. The power spectrum is now calculated from the amplitude spectrum and transformed into the autocorrelation function (AKF) using the Fourier transformation. The corresponding LPC vocoder parameter set can now be calculated from the AKF in a known manner using the usual methods (eg Levinson recursion) (see H. Hermansky, B. Hanson, H. Witka; "Perceptually based Predictive Analysis of Speech" on ICASSP 85, p. 13.10 conference proceedings).

Die direkte Transformation ist mit hohem technischen Aufwand verbunden. Es werden leistungsfähige Real-time-Prozessoren zur Berechnung von Spektren und Korrelationsfunktionen benötigt.The direct transformation is associated with high technical expenditure. Powerful real-time processors are required to calculate spectra and correlation functions.

Der Erfindung liegt die Aufgabe zugrunde, ein Verfahren zur Umcodierung von Kanalvocoder-Parameter in LPC-Vocoder-Parameter anzugeben, das bei hoher Genauigkeit relativ wenige Rechenoperationen benötigt.The invention is based on the object of specifying a method for transcoding channel vocoder parameters into LPC vocoder parameters which requires relatively few arithmetic operations with high accuracy.

Diese Aufgabe wird erfindungsgemäß durch die im Patentanspruch 1 angegebenen Merkmale gelöst.This object is achieved by the features specified in claim 1.

Im folgenden wird ein bekanntes Verfahren zur Umcodierung anhand der mathematischen Methoden erläutert.A known method for recoding is explained below using the mathematical methods.

Ausgangspunkt sind die Kanalvocoder-Parameter, die beispielsweise als Leistungsspektrum vorliegen (siehe FIG 1). Dieses Leistungsspektrum liegt bei einem Kanalvocoder nur in einer abschnittsweisen konstanten Form b_k mit Sprüngen an den Übergangsstellen von b_k nach b_k+1 vor. In FIG 1 sind als diese Parameter b_k Energiewerte e_j dargestellt, wobei der Wert e_j der Energie im Kanal mit der Nummer j entspricht. Hierbei entspricht in allgemein bekannter Weise die Kanalenergie der Leistung in einem 20 ms-Intervall (dies ist das Intervall, nach dem jeweils neue Filterparameter eingestellt werden). Dieses Intervall ist auch gleichzeitig das Transformationsintervall.The starting point are the channel vocoder parameters, which are available, for example, as a power spectrum (see FIG. 1). This range of services is only available in a channel vocoder in a section-wise constant form b _k with jumps at the transition points from b _k to b _{k + 1} . In FIG 1, as these parameters b _k energy values e _j shown, where the value e _{j is} the energy in the channel with the number j corresponds. In a generally known manner, the channel energy corresponds to the power in a 20 ms interval (this is the interval after which new filter parameters are set in each case). This interval is also the transformation interval.

Aus diesem "rohen" Spektrum b_k wird durch Faltung mit einer Glättungsfunktion g (i, s) ein geglättetes Spektrum a_k (siehe FIG 2) gebildet. Die Glättungsfunktion g ist eine gerade Funktion, g (i, s) = g (-i, s), mit i als Argument und mit s als Streuung, durch die die Breite der Glättungsfunktion g gegeben ist.From this "raw" spectrum b _k , a smoothed spectrum a _k (see FIG. 2) is formed by folding with a smoothing function g (i, s). The smoothing function g is an even one Function, g (i, s) = g (-i, s), with i as an argument and with s as a scattering, by which the width of the smoothing function g is given.

Für diese Glättungsfunktion g eignen sich beispielsweise Gauß'sche Glockenkurven oder ähnliche Funktionen. Als Beispiel für die Gauß'sche Glockenkurve wird folgende Funktion angegeben:Gaussian bell curves or similar functions are suitable for this smoothing function g, for example. The following function is given as an example for the Gaussian bell curve:

Weitere mögliche Glättungsfunktionen g sind die aus der Filtertheorie und der digitalen Signalverarbeitung bekannten Tiefpaßfunktionen. In diesen Fällen definiert die Streuung s die Eckfrequenzen des jeweiligen Tiefpasses.Further possible smoothing functions g are the low-pass functions known from filter theory and digital signal processing. In these cases, the spread s defines the corner frequencies of the respective low-pass filter.

Für den Spezialfall eines Diracimpulses

würde b_k unverändert auf das geglättete Spektrum a_k abgebiltet werden.For the special case of a Dirac pulse

b _{k would} still be mapped onto the smoothed spectrum a _k .

Bei der Glättung eines realen Sprachspektrums (b_k) kann die Streuung s eine Funktion der aktuellen Spektrallinie sein. In diesem Fall wird bei höheren Frequenzen und damit breiteren Kanälen in b_k eine größere Streuung s für die Glättungsfunktion g (i, s) gewählt als bei tieferen Frequenzen. Damit ist eine Anpassung der Glättung an die Tonheitsempfindung (Bark - Skala) des menschlichen Ohres möglich. Über die Wahl des oder der Streuungen s ist der "Wohlklang" bei der Sprachsynthese empirisch wählbar.When smoothing a real speech spectrum (b _k ), the scatter s can be a function of the current spectral line. In this case, a larger scatter s is selected for the smoothing function g (i, s) at higher frequencies and thus wider channels in b _k than at lower frequencies. This makes it possible to adjust the smoothing to the sensation of tonality (Bark scale) of the human ear. The "harmony" in speech synthesis can be empirically selected by the choice of the scatter (s).

Für die Berechnung des geglätteten Spektrums a_k aus dem "Roh"-Spektrum b_k ergibt sich somit folgende Formel:

mit g : Glättungsfunktion
u : Glättungsbreite (Normierung)
u . k : Streuung
a_k: K-ter Koeffizient des geglätteten Leistungsspektrums
N : Anzahl der Spektralkoeffizienten
b_l: l-ter Koeffizient des Rohspektrums.The following formula thus results for the calculation of the smoothed spectrum a _k from the "raw" spectrum b _k :

with g: smoothing function
u: smoothing width (normalization)
u. k: scatter
a _k : Kth coefficient of the smoothed power spectrum
N: number of spectral coefficients
b _l : lth coefficient of the raw spectrum.

Die LPC-Koeffizienten werden i. a. aus der Kurzzeit-Autokorrelationsfunktion (ca. 20 ms), kurz AKF genannt, des Sprachsignals errechnet. Diese AKF, d.h. deren Korrelationskoeffizienten r_i lassen sich auch aus dem Leistungsspektrum des Sprachsignals durch die inverse, diskrete Fouriertransformation bestimmen.The LPC coefficients are generally calculated from the short-term autocorrelation function (approx. 20 ms), AKF for short, of the speech signal. These AKF, ie their correlation coefficients r _i , can also be determined from the power spectrum of the speech signal by the inverse, discrete Fourier transformation.

Für die M Korrelationskoeffizienten r_i ergeben sich dann folgende Gleichungen:

i = 0,1 ... M, Anzahl der Korrelationskoeffizienten (sonst wie in Formel (1)).The following equations then result for the M correlation coefficients r _i :

i = 0.1 ... M, number of correlation coefficients (otherwise as in formula (1)).

Formel (1) in Formel (2) eingesetzt ergibt nach Anwendung des Kommutativgesetzes:

Formula (1) used in formula (2) results in application of the commutative law:

Die N Spektrallinien b_l des Rohspektrums lassen sich von den Kanalenergiewerten e_j ableiten (siehe FIG 1)The N spectral lines b _{l of} the raw spectrum can be derived from the channel energy values e _j (see FIG. 1)

Bei realen Vocodern liegen die Kanalzahlen und damit auch die Anzahl der Kanalenergiewerte e_j bei etwa 16-18. Für die Anzahl der Spektralkoeffizienten N im Bereich von etwa 256 lassen sich die Koeffizienten b_k des "rohen" Leistungsspektrums folgendermaßen darstellen:

(4) b_l = e_i für l = m_j....(m_j+1-1)
m_j : Index der ersten Spektrallinie des Kanals j
m_j+1-1 : Index der letzten Spektrallinie des Kanals jIn real vocoders, the number of channels and thus also the number of channel energy values e _j is around 16-18. For the number of spectral coefficients N in the range of approximately 256, the coefficients b _{k of} the "raw" power spectrum can be represented as follows:

(4) b _l = e _i for l = m _j .... (m _{j + 1} -1)
m _j : index of the first spectral line of channel j
m _{j + 1} -1: index of the last spectral line of channel j

Formel (4) eingesetzt in Formel (3) ergibt folgende allgemeine Gleichung zur Berechnung der AKF aus den Vocoder-Kanalenergiewerten
e_j mit j = l-P

m = l erste Spektrallinien des ersten Kanals
m_p= N letzte Spektrallinie des letzten KanalsFormula (4) used in formula (3) gives the following general equation for calculating the AKF from the vocoder channel energy values
e _j with j = lP

m = l first spectral lines of the first channel
m _p = N last spectral line of the last channel

Im folgenden wird das erfindungsgemäße Verfahren zur Umcodierung erläutert.The method for recoding is explained below.

Alle Elemente nach den Vocoder-Kanalenergiewerten e_j sind Konstante.All elements after the vocoder channel energy values e _j are constant.

Für ein vorgegebenes Frequenz- und Zeitraster, hinsichtlich der Kanalvocoder- und der LPC-Vocoder-Parameter, läßt sich die Formel (5) in eine Matrixmultiplikation umschreiben:

i = 0...M: Koeffizienten der AKF
P : Kanalzahl
mit:

oder in Matrix-Schreibweise

= C ×

mit

= AKF-Vektor
C : Matrix mit den Elementen aus Formel (7)

: Kanalvocoder-EnergievektorFor a given frequency and time grid, with regard to the channel vocoder and the LPC vocoder parameters, the formula (5) can be described as a matrix multiplication:

i = 0 ... M: coefficients of the AKF
P: number of channels
With:

or in matrix notation

= C ×

With

= AKF vector
C: matrix with the elements from formula (7)

: Channel vocoder energy vector

Zur Umcodierung werden beim erfindungsgemäßen Verfahren nur einmal die Elemente der Matrix C für eine bestimmte Vocoder-Kombination berechnet. Anschließend sind zur Umcodierung der jeweiligen Sprach-Parameter nur noch Matrixmultiplikationen zwischen den Energievektoren E (der die Parameter enthält) und der Matrix C auszuführen.For the transcoding, the elements of matrix C are calculated only once for a certain vocoder combination in the method according to the invention. Subsequently, only matrix multiplications between the energy vectors E (which contains the parameters) and the matrix C have to be carried out in order to recode the respective speech parameters.

Für einen praktischen Fall mit beispielsweise P=18 Kanälen eines Kanalvocoders und einer gewünschten Zahl von 11 Autokorrelationswerten für LPC-10 sind somit nur noch ca 200 Multiplikationen und etwa ebensoviele Additionen nötig. Bei konventionellen Verfahren werden ca. 4000 Rechenoperationen benötigt.For a practical case with, for example, P = 18 channels of a channel vocoder and a desired number of 11 autocorrelation values for LPC-10, only about 200 multiplications and about as many additions are required. Conventional methods require approximately 4000 arithmetic operations.

Im folgenden wird anhand von FIG 3 eine Schaltungsanordnung zur Durchführung der vorstehend beschriebenen Matrixmultiplikation erläutert.A circuit arrangement for carrying out the matrix multiplication described above is explained below with reference to FIG. 3.

An einem Eingang 1 eines ersten Speichers 2 liegen die geglätteten Kanalvocoder-Parameter a_p an. Es wird beispielsweise jeweils ein Satz dieser Parameter, bei 18 Kanälen also 18 Werte, in den ersten Speicher 2 eingeschrieben.The smoothed channel vocoder parameters a _{p are present} at an input 1 of a first memory 2. For example, it will A set of these parameters, in the case of 18 channels, ie 18 values, is written into the first memory 2.

Es soll folgende Rechenoperation ausgeführt werden:

mit l_i : LPC-Vocoder-Parameter (diese entsprechen den Autokorrelationskoeffizienten r_i in Formel (6))
c_ip: Transformationskoeffizienten (Matrixelemente), berechnet nach Formel (7)
a_p: Kanalvocoder-ParameterThe following arithmetic operation is to be carried out:

with l _i : LPC vocoder parameters (these correspond to the autocorrelation coefficients r _i in formula (6))
c _ip : transformation coefficients (matrix elements), calculated according to formula (7)
a _p : Channel vocoder parameters

Hierbei werden für eine Umcodierung der Parameter eines vorgegegebenen Kanalvocoders in Parameter eines vorgegebenen LPC-Vocoders die Transformationskoeffizienten c_ip der Matrix C berechnet und in einem Koeffizientenspeicher 3 abgelegt.For a recoding of the parameters of a given channel vocoder into parameters of a given LPC vocoder, the transformation coefficients c _{ip of} the matrix C are calculated and stored in a coefficient memory 3.

Zur Durchführung der Matrixmultiplikation werden die Kanalvocoder-Parameter a_p im ersten Speicher 2 von einem ersten Zähler 4 nacheinander adressiert. Analog dazu werden die Koeffizienten c_ip im Koeffizientenspeicher 3 nach ihrem Index p adressiert.To carry out the matrix multiplication, the channel vocoder parameters a _p in the first memory 2 are addressed in succession by a first counter 4. Similarly, the coefficients c _{ip are} addressed in the coefficient memory 3 according to their index p.

In einem Multiplizierer 5 werden die adressierten Kanalvocoder-Parameter a_p und die adressierten Koeffizienten c_ip multipliziert, und in einem nachgeschalteten Addierer 6 aufsummiert. Hierbei wird der Index i der Koeffizienten c_ip solange konstant gehalten, bis der Index i seinen größten Wert, in Formel 8 beispielsweise 17, erreicht hat. Die gebildete Summe wird als LPC-Parameter l_i in einen zweiten Speicher 7 eingeschrieben. Danach wird von einem zweiten Zähler 8 der Index i um eins erhöht, und der nächste LPC-Parameter l_i+1 berechnet. Hierzu adressiert der zweite Zähler 8 zum einen die Koeffizienten c_ip im Koeffizientenspeicher 3 nach ihrem Index i, und zum anderen die LPC-Vocoder-Parameter im zweiten Speicher 7.The addressed channel vocoder parameters a _p and the addressed coefficients c _{ip are} multiplied in a multiplier 5 and added up in a downstream adder 6. Here, the index i of the coefficients c _{ip is} kept constant until the index i has reached its greatest value, for example 17 in formula 8. The sum formed is written into a second memory 7 as LPC parameter l _i . The index i is then increased by one by a second counter 8 and the next LPC parameter l _{i + 1 is} calculated. For this purpose, the second counter 8 addresses the coefficients c _ip in the coefficient memory 3 on the one hand according to their index i, and on the other hand the LPC vocoder parameters in the second memory 7.

Die beiden Zähler 4 und 8 werden von einer Taktsteuerung 9 getaktet.The two counters 4 and 8 are clocked by a clock controller 9.

An einem Ausgang 10 des zweiten Speichers 7 ist dann jeweils ein transformierter bzw. umcodierter Satz von LPC-Vocoder-Parametern abnehmbar.A transformed or recoded set of LPC vocoder parameters can then be removed at an output 10 of the second memory 7.

Claims

1. A method for recoding digital channel vocoder parameters, which were obtained in the analysis part of the channel vocoder from a natural speech signal, into digital LPC vocoder parameters, which are processed in the synthesis part of the LPC vocoder to a synthetic speech signal, the channel vocoder parameters are present as a power spectrum, the LPC vocoder parameters being calculated from the short-term autocorrelation function, the power spectrum being smoothed using a smoothing function (g), and the correlation coefficients of the autocorrelation function being calculated from the smoothed power spectrum using an inverse, discrete Fourier transformation, characterized in that for a given number of channels of the channel vocoder and for a given number of parameters of the LPC vocoder with a given frequency and time grid matrix elements (c _ij ) are calculated from the constant quantities and stored in a coefficient memory (3), so that di e LPC vocoder parameters can be derived from the channel vocoder parameters by means of matrix multiplications, the parameters of one of the vocoders in each case forming a vector.

2. The method according to claim 1, characterized in that the smoothing function (g = g (i, s)) includes a scatter (s) by which the width of the smoothing function is given.

3. The method according to claim 1 or claim 2, characterized in that the width (s) of the smoothing function (g) is a function of the parameters of the channel vocoder.