KR100269357B1

KR100269357B1 - Speech recognition method

Info

Publication number: KR100269357B1
Application number: KR1019980015696A
Authority: KR
Inventors: 이윤근; 김기백; 이병수; 이종석
Original assignee: 구자홍; 엘지전자주식회사
Priority date: 1998-04-30
Filing date: 1998-04-30
Publication date: 2000-10-16
Anticipated expiration: 2018-04-30
Also published as: KR19990081663A

Abstract

선형 스펙트럼 쌍(LSP)을 특징 벡터로 사용하는 경우에 발생하는 오류를 개선하는 음성 인식 방법에 관한 것으로서, 특히 음성 신호가 입력되면 먼저 LSP를 추출한 후 이를 하기 식에 의해 다시 유사-캡스트럼으로 변환하여 음성 인식시의 특징 벡터로 이용함으로써,The present invention relates to a speech recognition method that improves an error occurring when a linear spectrum pair (LSP) is used as a feature vector. In particular, when a speech signal is input, an LSP is first extracted and then converted into a pseudo-capstrum by the following equation. By using it as a feature vector in speech recognition,

음성의 자음 성분에서도 좋은 특징 벡터로 이용될 수 있어 보다 좋은 성능을 갖는 음성 인식기를 구현할 수 있다. 특히, 음성 샘플로부터 특징 벡터를 추출하는 것이 불가능한 프로세서를 사용하는 경우에 유효하며, 보코더가 내장된 통신 기기의 음성 인식에 적용하면 별도의 LSP 추출 과정을 거치지 않아도 되므로 계산량 측면에서 많은 이득을 볼 수 있다.Since the consonant component of speech can be used as a good feature vector, a speech recognizer having better performance can be implemented. In particular, it is effective when using a processor that is unable to extract feature vectors from speech samples. When applied to speech recognition of vocoder-embedded communication devices, it is not necessary to go through separate LSP extraction process. have.

Description

Speech recognition method

본 발명은 음성 인식에 관한 것으로서, 특히 선형 스펙트럼 쌍(Line Spectral Pairs ; LSP)을 특징 벡터로 사용하는 경우에 발생하는 오류를 개선하는 음성 인식 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech recognition, and more particularly, to a speech recognition method for improving an error occurring when using linear spectral pairs (LSP) as a feature vector.

디지털 신호 처리 기술 발전은 인간의 의사 전달 수단인 음성 신호를 다양한 방면에 응용할 수 있는 가능성을 보여주었다. 이를 가능하게 하는 음성 인식 기술 중 가장 간단한 것은 화자 종속 고립 단어 인식이다. 이는 훈련시킨 사람의 목소리만을 인식할 수 있으며, 단어(또는 짧은 문장) 단위로 발성된 음성만 인식할 수 있다. 이를 위한 음성 인식 알고리즘은 이미 많이 알려져 있는데 크게 음성 구간 검출 과정과 특징(feature) 추출과정, 그리고 매칭 과정으로 나눌 수 있다.The development of digital signal processing technology has shown the possibility of applying voice signals, a means of human communication, in various fields. The simplest of the speech recognition technologies that make this possible is speaker dependent isolated word recognition. It can recognize only the voice of the trained person, and can only recognize the voice spoken in units of words (or short sentences). Many speech recognition algorithms for this purpose are already known and can be divided into speech section detection process, feature extraction process, and matching process.

즉, 도 1에 도시된 바와 같이 마이크(11)를 통해 음성 신호가 입력되면 A/D 컨버터(12)에서 이를 디지털 신호로 변환한 후 음성 구간 검출부(13)로 출력한다. 상기 음성 구간 검출부(13)는 디지털 음성 신호를 짧은 구간의 신호(즉, 프레임)로 분할한 후 각 프레임의 에너지와 영교차율(Zero Crossing Rate) 그리고, 시간길이 정보를 이용하여 입력된 신호중에서 실제 발성된 음성 구간만을 검출한 후 특징 추출부(14)로 출력한다. 상기 특징 추출부(14)에서는 음성 구간에 해당하는 프레임의 특징(feature)을 추출하여 입력된 음성의 테스트 패턴을 만든 후 매칭부(15)로 출력한다. 상기 매칭부(16)에서는 테스트 패턴과 기준 데이터용 메모리(16)에 저장된 각 기준 패턴들과 각각 비교하여 테스트 패턴과 가장 유사한 특징을 갖는 기준 패턴을 인식된 음성으로 출력한다. 이때, 상기 기준 데이터용 메모리(16)에 음성 신호의 기준 패턴을 저장하는 경우도 마찬가지로, 상기 특징 추출부(14)에서 음성 구간에 해당하는 프레임의 특징을 추출하여 기준 패턴을 만든 후 기준 데이터용 메모리(15)에 저장하는데, 이와 같은 동작을 인식하고자 하는 음성 신호에 대하여 반복 수행하여 기준 패턴을 기준 데이터용 메모리(15)에 데이터베이스화하게 된다.That is, as shown in FIG. 1, when a voice signal is input through the microphone 11, the A / D converter 12 converts the voice signal into a digital signal and outputs the digital signal to the voice section detector 13. The voice section detector 13 divides the digital voice signal into a short section of a signal (ie, a frame), and then actually uses the energy, zero crossing rate, and time length information of each frame. Only the spoken speech section is detected and output to the feature extractor 14. The feature extractor 14 extracts a feature of a frame corresponding to the speech section, creates a test pattern of the input speech, and outputs the test pattern to the matcher 15. The matching unit 16 compares the test pattern with each of the reference patterns stored in the reference data memory 16 and outputs a reference pattern having a feature most similar to the test pattern as a recognized voice. In this case, in the case of storing the reference pattern of the voice signal in the reference data memory 16, the feature extractor 14 extracts the feature of the frame corresponding to the voice section to create a reference pattern and then uses the reference data. In the memory 15, the voice signal to recognize such an operation is repeatedly performed to database the reference pattern in the memory 15 for reference data.

한편, 상기 특징 추출부(14)의 특징 추출 과정에서 이용되는 음성신호 처리방법 중 하나인 LPC(Linear Prediction Coding)는 과거의 신호들에 의한 현재 신호의 예측과 또 실제의 현재 신호와의 차이에 해당하는 오류를 최소화하는 방향으로 음성 신호를 처리하고 있다. 즉, LPC는 역 필터(Inverse Filter)의 개념을 도입한 것으로, 성대를 움직이는 자극을 역 필터의 입력으로 생각하여 유성음시 주기적 펄스열, 무성음시 불규칙 잡음이라 생각하여 이 입력이 성도를 통해 나온 신호를 우리가 들을 수 있는 음성 신호로 생각하고, 이 생각으로부터 음성 신호를 처리하여 역 필터에 관한 최적의 정보를 알아내는 것이다.Meanwhile, LPC (Linear Prediction Coding), which is one of the voice signal processing methods used in the feature extraction process of the feature extractor 14, is based on the prediction of the current signal by past signals and the difference between the actual current signal and the current signal. The voice signal is processed in the direction of minimizing the corresponding error. In other words, LPC introduces the concept of inverse filter, which considers the stimulus that moves the vocal cords as the input of the inverse filter, and considers the periodic pulse train at voiced sound and irregular noise at unvoiced sound. Think of it as a speech signal that we can hear, and then process that speech signal to find out the best information about the inverse filter.

이러한 LPC 이론에 의한 모델 베이스(model base)의 음성 분석·합성의 이론과 실험적 연구가 도입되어 PARCOR(Partial Auto Correlation)법, LSP등이 개발되었다.The theory and experimental studies of speech analysis and synthesis of the model base by the LPC theory were introduced, and the PARCOR (Partial Auto Correlation) method, LSP, and the like were developed.

상기 LSP는 음성에서 포만트 정보를 나타내는 것으로, 각 계수들의 크기가 순서대로 배열되는 특징(Ordering property)을 갖고 있을뿐만 아니라 변형에 강하기 때문에 음성 압축 분야에서 스펙트럼 포락(Spectrum envelope)을 양자화하는데 널리 사용되고 있으며, 특히 음성 데이터를 코딩할 때 많이 사용한다.The LSP represents formant information in speech, and is widely used for quantizing a spectral envelope in the speech compression field because it has an ordering property in which the magnitudes of the coefficients are arranged in order as well as being resistant to deformation. In particular, it is used a lot when coding voice data.

즉, p차 선형 계수에 의해서 결정되는 조음 필터의 주파수 응답 함수를 라고 하고, 인간의 성도가 성문파가 완전 반사되는 이상적인 공명관이라고 가정하면, LSP 계수는 두 개의 가상 필터 함수 P_p(z), Q_p(z) 의 폴(pole)들이 존재하는 z 평면상에서의 위치를 나타나게 된다. 이때, P_p(z)와 Q_p(z) 는 다음의 수학식 1과 같이 표현된다.That is, the frequency response function of the articulation filter determined by the p-order linear coefficient If we assume that the human saint is the ideal resonator with full reflection of the glottal wave, then the LSP coefficients are the two virtual filter functions. P _p (z), Q _p (z) This indicates the position on the z plane where the poles are present. At this time, P _p (z) and Q _p (z) Is expressed by Equation 1 below.

상기 수학식 1의 A_p(z), B_p(z)는 하기의 수학식 2와 같은 관계가 있다.A _p (z) and B _p (z) in Equation 1 have the same relationship as in Equation 2 below.

B_p(z) = z^-(p+1)A_p(z)B _p (z) = z- ^{(p + 1)} A _p (z)

이때, 인간의 성도를 이상적인 공명관으로 가정했으므로, 반사시의 에너지 손실이 없다. 따라서, P_p(z)와 Q_p(z) 는 선 스펙트럼(Line spectrum) 형태가 된다. 또한, P_p(z)와 Q_p(z) 의 모든 근들은 z 평면상에서 단위원(unit circle) 위에 존재하며, P_p(z)와 Q_p(z) 의 근들이 단위원 위에서 교대로 배열되는 특징을 갖는다. 그리고, 일단 P_p(z)와 Q_p(z) 의 근들이 모두 구해지면 이를 이용해서 LSP 계수를 손쉽게 구할 수 있다.At this time, since human saints are assumed as ideal resonance tubes, there is no energy loss during reflection. therefore, P _p (z) and Q _p (z) Is in the form of a line spectrum. Also, P _p (z) and Q _p (z) All the roots of are on the unit circle on the z plane, P _p (z) and Q _p (z) The roots of are characterized in that they are alternately arranged above the unit circle. And, first P _p (z) and Q _p (z) Once all roots of are found, the LSP coefficient can be easily obtained using this.

이러한 LSP를 보코더를 통해 엔코딩된 음성 데이터로부터 음성 인식을 해야 할 때, 음성의 특징 벡터로 사용하면 계산량 측면에서 많이 유리해진다. 특히 실시간으로 엔코딩 패킷으로부터 데이터를 디코딩하고 음성 샘플로부터 특징 벡터를 추출하는 것이 불가능한 프로세서를 사용하는 경우에 LSP를 음성의 특징 벡터로 사용하면 계산을 간편하게 할 수 있다.When the LSP needs to be recognized from the speech data encoded by the vocoder, it is advantageous in terms of calculation amount when used as the feature vector of the speech. In particular, when using a processor capable of decoding data from an encoding packet in real time and extracting a feature vector from a voice sample, the use of LSP as a feature vector of speech can simplify the calculation.

그러나, 음성의 극점이 제대로 나타나지 않는 경우는 LSP가 제대로 추출되지 않으므로 음성의 자음 성분에서는 좋은 특징 벡터로 이용될 수 없는 문제점이 있다.However, when the poles of the voice do not appear properly, the LSP is not properly extracted, and thus there is a problem in that the consonant component of the voice cannot be used as a good feature vector.

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로서, 본 발명의 목적은 LSP를 유사-캡스트럼(Pseudo-cepstrum)으로 변환하여 음성 인식의 특징 벡터로 이용하는 음성 인식 방법을 제공함에 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide a speech recognition method that converts an LSP into pseudo-cepstrum and uses it as a feature vector of speech recognition.

도 1은 일반적인 음성 인식 시스템의 구성 블록도1 is a block diagram of a general speech recognition system

도 2는 본 발명에 따른 음성 인식 방법을 수행하기 위한 흐름도2 is a flowchart for performing a voice recognition method according to the present invention.

도면의 주요부분에 대한 부호의 설명Explanation of symbols for main parts of the drawings

11 : 마이크 12 : A/D 컨버터11: microphone 12: A / D converter

13 : 음성 구간 검출부 14 : 특징 추출부13 voice section detection unit 14 feature extraction unit

15 : 매칭부 16 : 기준 데이터용 메모리15 matching unit 16 memory for reference data

상기와 같은 목적을 달성하기 위한 본 발명에 따른 음성 인식 방법은, 음성이 입력되면 먼저 선형 스펙트럼 쌍(LSP)을 추출한 후 이를 다시 유사-캡스트럼으로 변환하여 음성 인식시의 특징 벡터로 이용함을 특징으로 한다.In the speech recognition method according to the present invention for achieving the above object, when a speech is input, first extracting a linear spectrum pair (LSP), and then converts it to a pseudo-capstrum and uses it as a feature vector in speech recognition. It is done.

이러한 음성 인식 방법에 의해 음성의 자음 성분에서도 좋은 특징 벡터로 이용될 수 있으며, 보코더가 내장된 통신 기기의 음성 인식에 적용하면 별도의 LSP 추출 과정을 거치지 않아도 되므로 계산량 측면에서 많은 이득을 볼 수 있다.This speech recognition method can be used as a good feature vector in the consonant components of speech. When applied to speech recognition of a vocoder-embedded communication device, the LSP extraction process does not require a separate LSP extraction process. .

본 발명의 다른 목적, 특징 및 잇점들은 첨부한 도면을 참조한 실시예들의 상세한 설명을 통해 명백해질 것이다.Other objects, features and advantages of the present invention will become apparent from the following detailed description of embodiments taken in conjunction with the accompanying drawings.

이하, 본 발명의 바람직한 실시예를 첨부도면을 참조하여 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명에 따른 음성 인식 방법을 수행하기 위한 흐름도로서, 보코더를 이용하여 LSP 계수를 추출하는 경우를 실시예로 하고 있다. 상기 보코더는 음원 코딩을 사용하는 부호기로서, 상기 보코더의 출력 데이터는 스펙트럼 정보를 나타내는 계수, 음성의 여기신호(Excitation signal)를 모델링하는 정보와 게인등으로 이루어져 있다. 예를 들어, QCELP의 경우는 LSP 계수, 코드북(codebook) 인덱스와 게인, 롱-텀 예측기(long-term predictor)의 지연값과 게인등이다.2 is a flowchart for performing a speech recognition method according to an embodiment of the present invention, in which an LSP coefficient is extracted using a vocoder. The vocoder is an encoder using sound source coding, and the output data of the vocoder is composed of coefficients representing spectral information, information for modeling an excitation signal of speech, and a gain. For example, in the case of QCELP, LSP coefficients, codebook indexes and gains, delay values and gains of long-term predictors, and the like.

즉, 마이크를 통해 음성이 입력되면(단계 201), PCM(Pulse Code Modulation) 또는 μ-law PCM등으로 변조한 후 보코더에서 엔코딩한다(단계 202). 상기 단계 202에서 엔코딩된 음성 신호를 짧은 구간의 신호(프레임)로 분할한 후 각 프레임의 에너지와 영교차율을 측정하여 실제 발성된 음성 구간만을 검출한다(단계 203). 일예로, 상기 엔코딩 결과로 출력되는 코드북 게인을 에너지 정보로 이용할 수 있다.That is, when a voice is input through the microphone (step 201), the voice signal is modulated by a pulse code modulation (PCM) or μ-law PCM and then encoded by the vocoder (step 202). After the speech signal encoded in step 202 is divided into a signal (frame) of short intervals, the energy and zero crossing rate of each frame are measured to detect only the actual speech segment (step 203). For example, the codebook gain output as the encoding result may be used as energy information.

상기 단계 203에서 음성 구간이 검출되면 음성 구간에 해당하는 프레임의 특징을 추출하는데, 본 발명은 보코더를 예로 든 경우이므로, 보코더에서 출력되는 LSP 계수를 이용한다. 즉, 상기 보코더에서는 엔코딩 결과로 음성의 스펙트럼 파라미터 예컨대, LSP 계수를 출력하므로 별도의 특징 추출 과정을 거치지 않아도 되는 장점이 있다. 그러나, 상기 LSP 계수는 전술된 바와같이 문제가 있으므로, LSP 계수를 하기의 수학식 3에 의해 유사-캡스트럼으로 변환한다(단계 204).When the voice section is detected in step 203, the feature of the frame corresponding to the voice section is extracted. Since the present invention is an example of a vocoder, the LSP coefficients output from the vocoder are used. That is, the vocoder outputs spectral parameters of the speech, for example, LSP coefficients, as an encoding result, and thus does not require a separate feature extraction process. However, since the LSP coefficients are problematic as described above, the LSP coefficients are converted into pseudo-capstrum by Equation 3 below (step 204).

즉, 어떤 신호의 캡스트럼은 하기의 수학식 4와 같이 그 신호의 스펙트럼에 로그(log)를 취하고 다시 역 푸리에 변환(Inverse Fourier Transform ; IFT)한 것을 말하며, 여기서 추출되는 것이 캡스트럼 계수이다. 상기 캡스트럼(cepstrum)이라는 용어는 스펙트럼(spectrum)이라는 단어의 앞부분을 역순으로 배열하여 만들어진 것이다. 또한, 캡스트럼은 주파수 영역의 함수를 역변환한 것이기 때문에 시간 영역의 함수라고 할 수 있으며, 캡스트럼이 갖는 특징 중 하나는 음성이 갖는 정보에서 스펙트럼 포락 정보와 세부 구조를 분리해낸다는 것이다.That is, the capstrum of a signal refers to a log in the spectrum of the signal and inverse Fourier transform (IFT), as shown in Equation 4 below, wherein the capstrum coefficient is extracted. The term cepstrum is made by arranging the front of the word spectrum in the reverse order. In addition, the capstrum is a function of the time domain because it is an inverse transform of the function of the frequency domain. One of the characteristics of the capstrum is that the spectral envelope information and the detailed structure are separated from the information of the voice.

여기서, S(w)는 파워 스펙트럼이고, Cn은 캡스트럴 계수이다.Here, S (w) is the power spectrum, and Cn is the captral coefficient.

하기의 수학식 5를 통해 캡스트럴 거리는 rms(Root mean square) 로그 스펙트럴 거리와 같음을 알 수 있다.Through Equation 5 below, it can be seen that the capstral distance is equal to the root mean square (rms) log spectral distance.

따라서, 캡스트럼을 이용하면 파워 스펙트럼의 차이를 간단하게 구할 수 있어 많이 이용되고 있다.Therefore, the use of the capstrum makes it easy to determine the difference in the power spectrum and is widely used.

그러나, LSP 파라미터로부터는 캡스트럼을 얻지 못하므로, 상기된 수학식 3과 같이 LSP 파라미터를 캡스트럼과 비슷한 유사-캡스트럼(pseudo cepstrum)으로 변환한다.However, since no capstrum is obtained from the LSP parameter, the LSP parameter is converted into pseudo-cepstrum similar to the capstrum as shown in Equation 3 above.

상기 변환된 유사-캡스트럼은 테스트 패턴 또는 기준 패턴의 특징 벡터로 이용된다. 즉, 상기 과정이 음성 신호의 기준 패턴을 저장하는 경우라면 상기 특징 벡터를 기준 패턴으로하여 기준 데이터용 메모리(16)에 저장하고, 상기 과정이 음성 매칭을 위한 경우라면 상기 특징 벡터를 입력되는 음성의 테스트 패턴으로하여 기준 데이터용 메모리(16)로부터 출력되는 기준 패턴들과 비교하는 매칭 과정을 수행한다(단계 205). 상기 단계 205에서는 테스트 패턴과 기준 패턴 사이의 유사도를 측정할 때 입력 음성과 저장되어 있는 음성의 발성 속도가 다를 수 있으므로 음성의 발성 속도에 따른 오차를 줄이기 위하여 이들을 타임 와핑(time-warping)하여 비교하는데 여기서, DTW(Dynamic Time Warping)방법이 이용된다. 즉, 등록된 각 기준 패턴의 수만큼 DTW를 수행하여 등록되어 있는 각 기준 패턴에 대한 유사도가 모두 계산되면 가장 유사한 기준 패턴을 추출한다. 상기 DTW 방법에는 여러 가지가 제안되어 있는데, 그중 하나가 테스트 패턴과 데이터베이스화된 기준 패턴들 사이의 스펙트럴(spectral) 거리를 측정하고 테스트 패턴과 가장 가까운 스펙트럴 거리를 갖는 기준 패턴을 인식 패턴으로서 선택하는 방법이 있다.The transformed pseudo-capstrum is used as a feature vector of a test pattern or a reference pattern. That is, when the process stores a reference pattern of a speech signal, the feature vector is stored as a reference pattern in the reference data memory 16. If the process is performed for speech matching, the feature vector is input. A matching process is performed to compare the reference patterns output from the reference data memory 16 with the test pattern of (step 205). In the step 205, when the similarity between the test pattern and the reference pattern is measured, the voice speeds of the input voice and the stored voice may be different, so that time-warping of the voices may reduce the error according to the voice speed. Here, the DTW (Dynamic Time Warping) method is used. That is, the DTW is performed by the number of registered reference patterns, and when the similarity of all registered reference patterns is calculated, the most similar reference pattern is extracted. Various DTW methods have been proposed, one of which measures a spectral distance between a test pattern and a database-based reference pattern and uses a reference pattern having a spectral distance closest to the test pattern as a recognition pattern. There is a way to choose.

그리고, 상기 단계 205에서 추출된 기준 패턴의 유사도가 일정 수준이상 예컨대, 일정 임계치 이하이면 인식된 결과가 올바른 것이라고 판단하고(단계 206), 상기 추출된 기준 패턴을 인식 결과로 출력한다(단계 207). 한편, 가장 유사한 기준 패턴과 테스트 패턴의 유사도가 일정 임계치 이상일 경우 등록되어 있지 않은 음성이 입력된 것으로 판별한다.If the similarity of the reference pattern extracted in the step 205 is above a predetermined level, for example, below a predetermined threshold, it is determined that the recognized result is correct (step 206), and the extracted reference pattern is output as the recognition result (step 207). . On the other hand, if the similarity between the most similar reference pattern and the test pattern is greater than or equal to a predetermined threshold, it is determined that an unregistered voice is input.

이와 같이, 본 발명은 CELP 계열 보코더를 사용하는 개인 휴대 통신기기의 음성 인식에 적용하면 더욱 효율적이다.As described above, the present invention is more effective when applied to speech recognition of a personal mobile communication device using a CELP-based vocoder.

이상에서와 같이 본 발명에 따른 음성 인식 방법에 의하면, 테스트 패턴 및 기준 패턴을 만들 때 특징 벡터로 이용되는 LSP를 유사-캡스트럼으로 변환함으로써, 음성의 자음 성분에서도 좋은 특징 벡터로 이용될 수 있어 보다 좋은 성능을 갖는 음성 인식기를 구현할 수 있다. 특히, 음성 샘플로부터 특징 벡터를 추출하는 것이 불가능한 프로세서를 사용하는 경우에 유효하며, 보코더가 내장된 통신 기기의 음성 인식에 적용하면 별도의 LSP 추출 과정을 거치지 않아도 되므로 계산량 측면에서 많은 이득을 볼 수 있다.According to the speech recognition method according to the present invention as described above, by converting the LSP used as a feature vector when generating a test pattern and a reference pattern to a pseudo-capstrum, it can be used as a good feature vector in the consonant components of the speech A speech recognizer with better performance can be implemented. In particular, it is effective when using a processor that is unable to extract feature vectors from speech samples. When applied to speech recognition of vocoder-embedded communication devices, it is not necessary to go through separate LSP extraction process. have.

Claims

In the speech recognition method, if the voice is input through a microphone or a telephone, the extracted voice feature is extracted and the extracted feature is used to generate a reference pattern and a test pattern for matching.

The feature extraction process further comprises the step of first extracting a linear spectral pair (LSP) parameter from the input speech signal and converting the extracted LSP parameter back to a pseudo-capstrum.

The method of claim 1, wherein the pseudo-capstram conversion step

A speech recognition method characterized by the following formula.

In the voice recognition method of the phone equipped with a vocoder for performing a modulation on the modulated voice signal after the modulation is performed when the voice is input,

And converting the linear spectral pair (LSP) encoded and output from the vocoder into a pseudo-capstrum and using it as a feature vector in speech recognition.

The method of claim 3, wherein the pseudo-capstram conversion step is

A speech recognition method characterized by the following formula.