KR20050037125A

KR20050037125A - Apparatus for recognizing a voice by using hidden markov model

Info

Publication number: KR20050037125A
Application number: KR1020030072564A
Authority: KR
Inventors: 정홍; 김용
Original assignee: 학교법인 포항공과대학교
Priority date: 2003-10-17
Filing date: 2003-10-17
Publication date: 2005-04-21

Abstract

본 발명은 HMM을 이용한 음성 인식장치에 관한 것으로, 아날로그 음성 신호를 디지털 음성 신호로 변환하는 샘플링 및 양자화 블록과, 변환된 디지털 음성 신호 중에서 주변의 잡음을 제거하는 잡음 제거부와, 잡음이 제거된 디지털 음성 신호의 특징을 벡터의 형태로 추출하는 특징 벡터 추출부와, 추출된 특징 벡터에 대하여 미리 학습을 통해 정해져 있는 확률 파라미터를 가지고 하드웨어로 구현시킨 아키텍쳐 HMM 알고리즘을 통해 음성 인식 결과를 출력하는 HMM을 이용한 음성 인식 블록을 포함한다. 따라서, 컴퓨터 없이 단독적으로 음성인식을 수행할 수 있으며, 또한 사용하고자 하는 장치에 편리하게 설치 가능하며, 병렬구조의 구현으로 실시간 음성인식을 가능하게 하여 음성인식 전용 가전제품, 자동차, 장난감, 이동통신 단말기, PDA 등의 여러 분야에 응용 가능하다는 효과가 있다. The present invention relates to a speech recognition apparatus using an HMM, comprising: a sampling and quantization block for converting an analog speech signal into a digital speech signal, a noise removing unit for removing ambient noise from the converted digital speech signal, and HMM outputting speech recognition results through a feature vector extraction unit for extracting features of digital speech signals in the form of a vector, and an architecture HMM algorithm implemented in hardware with probability parameters determined by learning about the extracted feature vectors in advance. It includes a speech recognition block using. Therefore, voice recognition can be performed independently without a computer, and can be conveniently installed in a device to be used. Also, by implementing a parallel structure, real-time voice recognition is possible. It is effective that it can be applied to various fields such as a terminal and a PDA.

Description

Speech recognition device using Hidden Markov model {APPARATUS FOR RECOGNIZING A VOICE BY USING HIDDEN MARKOV MODEL}

본 발명은 히든 마코프 모델(Hidden Markov Model, HMM)을 이용한 음성 인식장치에 관한 것으로, 특히 음성 인식 알고리즘으로 사용되는 HMM을 이용한 시스톨릭 배열 구조의 하드웨어 아키텍쳐로 구현시킨 음성인식장치에 관한 것이다.The present invention relates to a speech recognition apparatus using a Hidden Markov Model (HMM), and more particularly, to a speech recognition apparatus implemented using a hardware architecture of a systolic array structure using the HMM used as a speech recognition algorithm.

통상적으로, 음성인식장치는 음성인식 전화기, 음성인식 컴퓨터, 음성인식 자동차 등의 제품에 적용되며, 이러한 음성인식에 대한 일반인들의 관심이 높아지고 있는 실정이지만, 아직까지 음성인식 기술이 우리 생활의 깊숙한 곳까지 자리 잡지 못하며, 음성인식을 이용한 제품이 우리 주변에는 별로 없으며 그 제품을 사용해도 별 이점이 없다는 것이 현 실정이다.In general, voice recognition devices are applied to products such as voice recognition phones, voice recognition computers, voice recognition cars, etc., and the public's interest in such voice recognition is increasing, but voice recognition technology is still deep in our lives. It is not available until now, and there are few products using voice recognition around us, and there is no advantage in using the products.

한편, 외국의 경우에는 음성인식을 이용한 통신 서비스 및 제품이 많이 등장하고 있다. 즉, 외국의 음성인식에 대한 연구가 40년 이상이라는 사실을 감안해 보면 이러한 응용 제품의 출현은 타 기수에 비해 늦었다고 할 수 있다.On the other hand, in foreign countries, communication services and products using voice recognition have appeared. In other words, considering the fact that the study of voice recognition in foreign countries has been more than 40 years, the appearance of such application products is late compared to other users.

현실적으로, 우리나라의 경우, 음성인식 기술은 아직 미완성 단계이다. 다시 말해서, 음성인식 기술은 아직까지 많은 부분, 즉 어휘 수, 화자독립, 인식 방법, 환경의 부분에서 일반 사용자의 요구를 충분히 감당하지 못하고 있지만, HMM 알고리즘을 이용하여 음성 인식 매칭을 수행하여 사용자의 요구를 충족시키고 있는 실정이다. In reality, in Korea, speech recognition technology is still incomplete. In other words, the speech recognition technology has not been able to adequately meet the needs of the general user in many parts, that is, the number of words, speaker independence, recognition method, and environment. However, the speech recognition matching is performed by using the HMM algorithm. It is meeting the needs.

여기서, HMM 알고리즘은, 음성인식을 위하여 가장 널리 사용되는 방법으로 60년대 말과 70년대 초에 Baum과 그의 동료들에 의해 소개되었고, 70년대에는 Baker, Jelinek와 그의 동료들에 의해 음성처리에 응용된다. Here, the HMM algorithm was introduced by Baum and his colleagues in the late sixties and early seventies as the most widely used method for speech recognition, and was applied to speech processing by Baker, Jelinek and his colleagues in the seventies. do.

다시 말해서, HMM은 음성신호가 마코프 모델로 모델링될 수 있다는 가정 하에 훈련 과정에서는 음성 코퍼스를 이용해 마코프 모델의 확률적인 파라미터들을 구한 후, 기준 마코프 모델을 만들고 인식과정에서는 입력 음성과 가장 유사한 기준 마코프 모델을 선택하는 방식으로 음성을 인식한다. In other words, under the assumption that the speech signal can be modeled as a Markov model, the HMM obtains the stochastic parameters of the Markov model using the speech corpus in the training process, then creates a reference Markov model and the reference Markov model most similar to the input speech during the recognition Recognize the voice in a way to select it.

이러한 음성 인식 기술을 이용한 본격적인 응용 제품의 출현은 1980년대 후반에 시작되었으며, 우리나라의 경우 1990년대 초반 이후라고 볼 수 있다.The emergence of full-fledged applications using this speech recognition technology began in the late 1980s, and in Korea, it can be seen after the early 1990s.

따라서, 현재 사용되고 있는 음성 인식 장치는 소프트웨어 중심이 대부분임에 따라 실제 음성 인식 장치를 실생활에 응용시키기 위해서는 음성 인식 장치 중 평가와 디코딩 부분을 칩으로 구현하는 하드웨어 아키텍쳐로의 개발이 필요하게 되었다. Therefore, as the speech recognition apparatuses currently used are mostly software-oriented, in order to apply the actual speech recognition apparatuses to real life, it is necessary to develop a hardware architecture that implements the evaluation and decoding portions of the speech recognition apparatuses as chips.

이에, 본 발명은 상술한 하드웨어의 아키텍쳐 개발의 필요에 따라 안출된 것으로서, 그 목적은 음성 인식 알고리즘으로 사용되는 HMM을 이용한 시스톨릭 배열 구조의 하드웨어 아키텍쳐로 구현시킬 수 있도록 하는 HMM을 이용한 음성 인식장치를 제공함에 있다. Accordingly, the present invention has been made in accordance with the needs of the above-described hardware architecture development, the object of the present invention is to implement a speech recognition device using a HMM that can be implemented in a hardware architecture of the systolic array structure using the HMM used as a speech recognition algorithm In providing.

상술한 목적을 달성하기 위한 본 발명에서 HMM을 이용한 음성 인식장치는 아날로그 음성 신호를 디지털 음성 신호로 변환하는 샘플링 및 양자화 블록과, 변환된 디지털 음성 신호 중에서 주변의 잡음을 제거하는 잡음 제거부와, 잡음이 제거된 디지털 음성 신호의 특징을 벡터의 형태로 추출하는 특징 벡터 추출부와, 추출된 특징 벡터에 대하여 미리 학습을 통해 정해져 있는 확률 파라미터를 가지고 하드웨어로 구현시킨 아키텍쳐 HMM 알고리즘을 통해 음성 인식 결과를 출력하는 HMM을 이용한 음성 인식 블록을 포함하는 것을 특징으로 한다.In the present invention for achieving the above object, a speech recognition apparatus using the HMM comprises a sampling and quantization block for converting an analog speech signal into a digital speech signal, a noise removing unit for removing ambient noise from the converted digital speech signal; Speech recognition results through a feature vector extraction unit that extracts the features of the noise-free digital speech signal in the form of a vector, and an architecture HMM algorithm implemented in hardware with probability parameters determined by learning the extracted feature vectors in advance. Characterized in that it comprises a speech recognition block using the HMM to output the.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예의 동작을 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the operation of the preferred embodiment according to the present invention.

도 1은 본 발명에 따른 HMM을 이용한 음성 인식장치에 대한 블록 구성도로서, 사용자의 음성 신호(S1)를 마이크(S2)를 통해 제공받은 후, 제공된 아날로그 음성 신호를 샘플링과 양자화를 시켜 디지털 음성 신호로 변환하는 샘플링 및 양자화 블록(10)과, 변환된 디지털 음성 신호 중에서 인식률을 높이기 위해 사용자의 디지털 음성 신호만을 제외한 주변의 잡음을 제거하는 잡음 제거부(20)와, 잡음이 제거된 사용자의 디지털 음성 신호에서 실제 인식이 이루어지는 매칭부분에서 사용되도록 디지털 음성 신호의 특징을 벡터의 형태로 추출하는 특징 벡터 추출부(30)와, 추출된 특징 벡터를 제공받은 후, 미리 학습을 통해 정해져 있는 확률 파라미터를 가지고 HMM 알고리즘을 통해 음성 인식을 수행함에 있어서, 음성 인식 중에서 매칭이 이루어지는 부분을 하드웨어로 구현하는 아키텍쳐를 통해 음성 인식 결과를 출력하는 HMM을 이용한 음성 인식 블록(40)을 포함한다. 1 is a block diagram of a speech recognition apparatus using an HMM according to the present invention. After receiving a user's voice signal S1 through a microphone S2, the user's digital voice is sampled and quantized. Sampling and quantization block 10 for converting to a signal, a noise removing unit 20 for removing the ambient noise except the user's digital voice signal to increase the recognition rate of the converted digital voice signal, and the user of the noise is removed A feature vector extractor 30 for extracting a feature of the digital speech signal in the form of a vector so as to be used in a matching part that is actually recognized in the digital speech signal, and a probability that is determined through learning in advance after receiving the extracted feature vector. In performing the speech recognition through the HMM algorithm with parameters, it is hard to match the portion of the speech recognition that is matched. The voice recognition block 40 using the HMM outputs a voice recognition result through an architecture implemented by wear.

상술한 구성을 바탕으로, 본 발명에 따른 HMM을 이용한 음성 인식장치의 동작에 대하여 보다 상세하게 설명한다. Based on the above configuration, the operation of the speech recognition apparatus using the HMM according to the present invention will be described in more detail.

먼저, 샘플링 및 양자화 블록(10)은 사용자의 음성 신호(S1)를 마이크(S2)를 통해 제공받은 후, 제공된 아날로그 음성 신호를 샘플링과 양자화를 시켜 디지털 음성 신호로 변환하여 잡음 제거부(20)에 제공한다.First, the sampling and quantization block 10 receives the user's voice signal S1 through the microphone S2, and then converts the provided analog voice signal into a digital voice signal by sampling and quantizing the noise signal 20. To provide.

잡음 제거부(20)는 변환된 디지털 음성 신호 중에서 인식률을 높이기 위해 주변의 잡음을 제거한 후, 제거하고 남은 사용자의 디지털 음성 신호만을 특징 벡터 추출부(30)에 제공한다. The noise removing unit 20 removes the surrounding noise in order to increase the recognition rate among the converted digital voice signals, and then provides only the remaining digital voice signals to the feature vector extracting unit 30.

특징 벡터 추출부(30)는 잡음이 제거된 사용자의 디지털 음성 신호에서 실제 인식이 이루어지는 매칭부분에서 사용할 수 있도록 디지털 음성 신호의 특징을 벡터의 형태로 추출하여 HMM을 이용한 음성 인식 블록(40)에 제공한다.The feature vector extractor 30 extracts the features of the digital speech signal in the form of a vector so as to be used in a matching part where actual recognition is performed on the user's digital speech signal from which the noise is removed, and extracts the features of the digital speech signal into the speech recognition block 40 using the HMM. to provide.

HMM을 이용한 음성 인식 블록(40)은 본 발명의 핵심 기술로서, 매칭이 이루어지는 부분을 하드웨어로 구현하는 아키텍쳐 부분으로서, 특징 벡터 추출부(30)에서 추출된 특징 벡터를 제공받은 후, 미리 학습을 통해 정해져 있는 확률 파라미터를 가지고 HMM 알고리즘을 통해 음성 인식 결과를 출력한다. The speech recognition block 40 using the HMM is a core technology of the present invention, and is an architecture part that implements a matching part in hardware. The speech recognition result is output through the HMM algorithm with the probability parameter determined through.

한편, HMM을 이용한 음성 인식 블록(40)내 HMM 알고리즘을 통해 음성 인식 결과를 출력하는 부분에 대하여 보다 상세하게 설명한다. On the other hand, the part of outputting the speech recognition result through the HMM algorithm in the speech recognition block 40 using the HMM will be described in more detail.

즉, HMM 알고리즘은 기존의 기준 마코프 모델보다 복잡한 HMM을 사용하는데, 이것은 음성 패턴의 다양한 변화를 수용하기 위함이다. In other words, the HMM algorithm uses a more complex HMM than the conventional reference Markov model, to accommodate various changes in speech patterns.

다시 말해서, HMM은 관측이 불가능한 프로세스를, 관측이 가능한 심볼로 발생시키기 위한 프로세스를 통해 추정하는 이중 확률 프로세스이기 때문에 음성과 같이 다변성이 많고 발생 과정을 알 수 없는 프로세스를 표현하는데 적절한 모델링 방법이다. In other words, since HMM is a double probability process that estimates an unobservable process through a process for generating an observable symbol, it is an appropriate modeling method for expressing a multivariate and unknown process such as speech.

특히, HMM은 상태 천이에 의해 서로 연결된 상태의 모임으로서, 각 천이에는 2가지 종류의 확률로 관련돼 있다. 즉, 하나는 상태의 변화를 관장하는 천이 확률(transition probability)이며, 둘째는 천이가 이루어졌을 때 유한개의 관측 대상으로부터 각 출력 심벌이 방출되는 조건부 확률을 규정하는 출력 확률(observation probability)이다.In particular, the HMM is a group of states connected to each other by state transitions, and each transition has two kinds of probabilities. That is, one is a transition probability that governs a change of state, and the second is an output probability that defines a conditional probability that each output symbol is emitted from a finite number of objects when a transition is made.

결과적으로, HMM은 음성 패턴의 각 특징을 상태 천이 확률과 출력 확률로 표현함에 있어서, 상태 천이 확률은 현재의 상태는 바로 그 이전의 상태에만 의존하도록 하고, 출력 확률은 서로 독립적으로 수행하도록 한다. As a result, in expressing each feature of the speech pattern with state transition probability and output probability, the state transition probability causes the current state to depend only on the state immediately before it, and the output probability is performed independently of each other.

그리고, HMM을 이용해 음성인식 시스템을 구현할 때 수행해야 하는 세 가지 작업을 수행할 경우, 사용되는 몇 가지 요소들을 정의하면 다음과 같다. In addition, when performing the three tasks that must be performed when implementing a voice recognition system using the HMM, defining several elements used are as follows.

- 다 음 -- next -

N은 모델 상태의 수이고, 각각의 상태의 집합은 으로 시간 t에서의 상태는 로 정의한다.N is the number of model states, each set of states As the state at time t It is defined as

M은 각 상태에서의 출력 심볼 수이고, 각각의 심볼의 집합은 으로 정의한다.M is the number of output symbols in each state, and each set of symbols is It is defined as

A=는 상태 천이 확률로서, 이며,A = Is the state transition probability, Is,

는 상태 관찰 확률로서, 이며, Is the probability of state observation, Is,

는 초기 상태 확률로서, 이며, Is the initial state probability, Is,

출력 심볼열은 이다.The output symbol string is to be.

상술한 바와 같이, 정의된 HMM은 로 간단히 표시되며, 이러한 HMM이 실제로 사용되기 위해서는 세 가지 작업을 수행해야 한다.As mentioned above, the defined HMM is This is simply indicated by three things in order for this HMM to be used in practice.

첫째, 평가 작업으로서, 모델과 관측열이 주어졌을 때 관측열의 확률을 계산하는 방법으로 전향 알고리즘을 이용한다. First, as an evaluation task, a forward algorithm is used to calculate the probability of an observation sequence given a model and an observation sequence.

둘째, 디코딩 작업으로서, 인식 과정에 필요한 것으로 관측열이 주어졌을 때 최적의 상태열을 선택하는 것으로, 이런 경우는 비터비 알고리즘(Viterbi algorithm)을 사용한다. Second, as a decoding operation, an optimal state sequence is selected when an observation sequence is required as a recognition process. In this case, the Viterbi algorithm is used.

여기서, 비터비 알고리즘은,Here, the Viterbi algorithm is

로 표시되며, Is indicated by

여기서, 는 시간(t)과 상태(i) 까지 가장 높은 확률을 갖는 경로에 따라 누적된 best score(highest probability)를 나타낸다.here, Represents the best score (highest probability) accumulated along the path with the highest probability up to time (t) and state (i).

또한, 최적의 상태열인 코스트를 순환적인(recursive) 방법을 통해 계산하는 과정은 다음과 같다. In addition, the process of calculating the optimal state string cost through a recursive method is as follows.

1, Initialization은, 이고,1, Initialization, ego,

2, Recursion은, 인 것으로, 이를 하드웨어로 구현함에 있어 곱셈은 매우 어렵고 복잡함에 따라 양변에 -log를 취하여 덧셈기만으로 구조를 설계하면, 매칭 코스트는 간단한 덧셈기와 비교기만으로 구성할 수 있어 병렬 구조의 하드웨어 구현을 가능하게 하며,2, Recursion, As the multiplication is very difficult and complicated in hardware implementation, if you design the structure with only the adder by taking -log on both sides, the matching cost can be configured with only the simple adder and the comparator. ,

3, Termination은 이다.3, Termination to be.

이에 대하여, 도 2에 도시된 음성 인식 과정의 일 예를 참조하면서 설명한다.This will be described with reference to an example of the speech recognition process illustrated in FIG. 2.

즉, 도 2를 참조하면, "여러분 안녕하십니까" 의 음성 파형(S2-1)을 나타내는 것으로, 이중, "여러"에 해당되는 부분의 HMM(S2-4)을 나타낸다.That is, referring to FIG. 2, the voice waveform S2-1 of "Hello everyone" is shown, of which the HMM (S2-4) of the part corresponding to "multiple" is shown.

다시 말해서, "여러"는 "o", "ㅕ", "ㄹ", "ㅓ"라는 음소 HMM의 연결로 표시되고, 이중 "ㅕ"라는 모델의 각 상태(S2-2)에서 특징벡터(S2-3)를 발생하는 과정을 나타낸다. In other words, "several" is represented by the concatenation of the phoneme HMMs "o", "ㅕ", "ㄹ", and "ㅓ", and the feature vector (S2) in each state (S2-2) of the model "ㅕ". -3) shows the process of generating.

다시 말해서, S₁ 상태에서 x₁, x₂, x₃가 발생되고, S₂상태에서 x₄, x₅가 발생되며, S₃상태에서 x₆, x₇이 발생됨을 나타낸다.In other words, to be a x _1, x _2, x ₃ occurring in state S _1, and the x _4, x ₅ occurs in the state S _2, indicates that the x _6, x ₇ balsaengdoem in state S _3.

그리고, frame shift 시간 간격에 하나의 특징벡터(S2-3)가 음성신호로부터 추출되며, 천이확률은 상태 s_i에서 s_j로 천이하는 빈도를 나타내고, 관측확률밀도함수 b_i(x_t)는 상태 s_i에서 특징벡터 x_t를 관측할 확률을 나타낸다.In addition, one feature vector (S2-3) is extracted from the speech signal at frame shift time intervals, and the transition probability indicates the frequency of transition from state s _i to s _j , and the observation probability density function b _i (x _t ) is The probability of observing feature vector x _t in state s _i .

또한, 좌에서 우로의 천이만을 갖는 HMM에서 천이확률은 각 상태에서 머무는 프레임수와 다음 상태로 이동하는 프레임수의 비로부터 구할 수 있다. In addition, the transition probability in the HMM having only the transition from left to right can be obtained from the ratio of the number of frames staying in each state and the number of frames moving to the next state.

셋째, 학습 작업으로서, 훈련 과정에 필요한 학습에 관한 문제로 최적의 모델링을 하기 위해 각 파라미터를 조정하는 것으로, 바움-웰치 알고리즘(Baum-Welch algorithm)으로 해결한다. Third, as a learning task, by adjusting each parameter for optimal modeling as a problem about learning required for the training process, it is solved by the Baum-Welch algorithm.

본 발명은 HMM이 실제로 사용되기 위해서는 수행하는 세 가지 작업 중, 평가 작업 및 디코딩 작업을 하드웨어로 구현시켰고, 학습 작업은 기존 방법인 소프트웨어로 구현시킨 것이다. The present invention implements an evaluation task and a decoding task in hardware among the three tasks performed to actually use the HMM, and the learning task is implemented in software, which is an existing method.

도 3을 참조하면, 본 발명에 따른 HMM 알고리즘의 하드웨어 아키텍쳐의 전체 구성도로서, 미리 학습을 통해 얻어진 천이 확률(, transition probability)과 출력 확률(, observation probability)이 저장되어 있는 메모리(S3-1)와, 메모리(S3-1)에 저장된 확률값을 임시 보관하고 있다가 프로세싱 엘리먼트에 제공하는 왼쪽 면의 레지스터(S3-2-0, S3-2-j-1, S3-2-j, S3-2-j+1, S3-2-N-1) 및 오른쪽 면의 레지스터(S3-4-0, S3-4-j-1, S3-4-j, S3-4-j+1, S3-4-N-1)와, N개의 엘리먼트로 구성되어 있으며 각각의 엘리먼트는 한 단계 아래와 두 단계 아래의 엘리먼트에 연결되어 있으며, 왼쪽 및 오른쪽 레지스터로부터 제공되는 확률값을 계산하여 매칭 시스템(S3-5)에 제공하는 프로세싱 엘리먼트(S3-3-0, S3-3-j-1, S3-3-j, S3-3-j+1, S3-3-N-1)와, 프로세싱 엘리먼트에서 계산되어 제공된 확률값에 대하여 최종으로 매칭시켜 결정하는 매칭 시스템(S3-5)을 구비한다.Referring to Figure 3, the overall configuration diagram of the hardware architecture of the HMM algorithm according to the present invention, the transition probability ( , transition probability) and output probability ( memory S3-1 storing the observation probability and the left registers S3-2-0 and S3-2 that temporarily store the probability values stored in the memory S3-1 and provide them to the processing element. -j-1, S3-2-j, S3-2-j + 1, S3-2-N-1) and registers on the right side (S3-4-0, S3-4-j-1, S3-4 -j, S3-4-j + 1, S3-4-N-1) and N elements, each of which is connected to an element one level below and two levels below, from the left and right registers. Processing elements S3-3-0, S3-3-j-1, S3-3-j, S3-3-j + 1, S3-3 which calculate the provided probability values and provide them to the matching system S3-5. N-1) and a matching system (S3-5) for finally matching and determining the probability value calculated in the processing element.

먼저, 프로세싱 엘리먼트(S3-3-0, S3-3-j-1, S3-3-j, S3-3-j+1, S3-3-N-1) 중 임의의 프로세싱 엘리먼트의 내부 구조를 도 4를 참조하면서 알아본다. First, the internal structure of any of the processing elements S3-3-0, S3-3-j-1, S3-3-j, S3-3-j + 1, S3-3-N-1 See FIG. 4.

즉, 이전 시간인 t-1에서 계산된 코스트 값에 각각 상태 천이 확률을 각각의 가산기(S4-2,3,4)를 통하여 더하고, 그 결과(, , )를 비교기(S4-1)에 입력한다.That is, state transition probabilities are added to the cost values calculated at the previous time t-1 through the respective adders S4-2, 3, 4, and the result ( , , ) Is input to the comparator S4-1.

비교기(S4-1)는 각각의 가산기(S4-2,3,4)를 통해 제공되는 세 개의 입력 중 최소값을 출력으로 내보내며, 내보낸 최소값에 상태 관찰 확률인 을 가산기(S4-5)를 통하여 더한 값이 현재 시간 t의 코스트 값()이 된다.The comparator S4-1 outputs the minimum value of the three inputs provided through the respective adders S4-2, 3, and 4 as an output, and the state observation probability is equal to the exported minimum value. Is added through the adder (S4-5) to the cost value of the current time t ( )

가산기(S4-5)에 의해 계산된 코스트 값()은 다음 시간 t+1의 계산을 위해 도 3에 도시된 오른쪽 면 레지스터(S4-6)를 통과한 후 자신의 프로세싱 엘리먼트 내 비교기(S4-1)와 위 두개의 프로세싱 엘리먼트의 입력으로 들어간다.The cost value calculated by the adder (S4-5) ) Passes through the right side register S4-6 shown in FIG. 3 for the calculation of the next time t + 1 and then into the input of the comparator S4-1 and its two processing elements in its processing element.

도 5는 본 발명에 따른 HMM 모델 알고리즘을 설명하는 도면이다. 5 is a diagram illustrating an HMM model algorithm according to the present invention.

즉, 상태 천이 확률(state transition probability)은 와 같은 성질을 갖는 left-right 모델을 사용하고 있다.That is, the state transition probability is We are using a left-right model with the same properties.

다시 말해서, 상태는 뒤로 가는 경우가 없고, 또한 크게 변하는 경우가 없다는 식이다. 여기서, 도 5와 같은 천이를 보이기 위해서는 는 2이고, 초기 상태는 항상 1에서 시작해야 하며, 매칭은 도 5의 원점에서 시작해야 되는 것으로, 의 조건을 만족해야 한다.In other words, the state never goes backwards and never changes significantly. Here, to show the transition as shown in FIG. Is 2, the initial state should always start at 1, and the matching should start at the origin of FIG. The conditions of

도 6a 내지 도 6c는 도 3에 도시된 프로세싱 엘리먼트(S3-3-0, S3-3-j-1, S3-3-j, S3-3-j+1, S3-3-N-1)의 데이터 흐름에 대하여 도시한 도면이다.6A to 6C illustrate processing elements S3-3-0, S3-3-j-1, S3-3-j, S3-3-j + 1, and S3-3-N-1 shown in FIG. Is a diagram showing the data flow of the.

즉, 도 6a를 참조하면, 도 1의 특징 벡터 추출부(30)에서 제공된 한 프레임에 해당하는 특징 벡터(t₀, t₁, ..., t_N-1)가 메모리(S3-1) A영역에 들어가고, 이에 해당하는 천이확률인 a 파라미터가 시리얼하게 왼쪽 면의 레지스터(S3-2-0, S3-2-j-1, S3-2-j, S3-2-j+1, S3-2-N-1)를 통해 프로세싱 엘리먼트(S3-3-0, S3-3-j-1, S3-3-j, S3-3-j+1, S3-3-N-1)에 업데이트 된다. 이때, 필요한 클럭 수는 프로세싱 엘리먼트가 N개라 하면 3N+1 클럭이다.That is, referring to FIG. 6A, the feature vectors t ₀ , t ₁ ,..., T _N-1 corresponding to one frame provided by the feature vector extractor 30 of FIG. 1 are stored in the memory S3-1. Enter the area A, and the corresponding transition probability a parameter is serially registered on the left side of the register (S3-2-0, S3-2-j-1, S3-2-j, S3-2-j + 1, S3). -2-N-1) to update the processing elements (S3-3-0, S3-3-j-1, S3-3-j, S3-3-j + 1, S3-3-N-1) do. In this case, the required number of clocks is 3N + 1 clocks when there are N processing elements.

다음으로, 도 6a의 두 번째 도면은, 특징 벡터 추출부(30)에서 제공된 한 프레임에 해당하는 특징 벡터(t₀, t₁, ..., t_N-1)가 메모리(S3-1) B영역에 들어가고, 이에 해당하는 천이확률인 b 파라미터가 시리얼하게 오른쪽 면의 레지스터(S3-4-0, S3-4-j-1, S3-4-j, S3-4-j+1, S3-4-N-1)를 통해 프로세싱 엘리먼트(S3-3-0, S3-3-j-1, S3-3-j, S3-3-j+1, S3-3-N-1)에 업데이트 된다. 이때, 필요한 클럭 수는 N+1 클럭이다.Next, in the second drawing of FIG. 6A, the feature vectors t ₀ , t ₁ ,..., T _N-1 corresponding to one frame provided by the feature vector extractor 30 are stored in the memory S3-1. Entering the area B, the corresponding transition probability b parameter is serially registered on the right side of the register (S3-4-0, S3-4-j-1, S3-4-j, S3-4-j + 1, S3). -4-N-1) to update the processing elements (S3-3-0, S3-3-j-1, S3-3-j, S3-3-j + 1, S3-3-N-1) do. At this time, the required clock number is N + 1 clocks.

이와 같이, a 및 b 파라미터가 모두 프로세싱 엘리먼트에 업데이트 될 경우, 도 6a의 세 번째 그림과 같이, 업데이트된 값을 이용하여 매칭 코스트 값을 계산한다. As such, when both a and b parameters are updated in the processing element, as shown in the third figure of FIG. 6A, the matching cost value is calculated using the updated value.

다음으로, 도 6b는 특징 벡터(t₀, t₁,..., t_N-1)가 한 칸씩 시프트 하면서 도 6a의 과정을 반복적으로 수행하는 도면으로서, 한 프레임의 특징 벡터가 모두 끝날 때까지 수행한다.Next, FIG. 6B is a diagram in which the process of FIG. 6A is repeatedly performed while the feature vectors t ₀ , t ₁ , ..., t _N-1 are shifted by one space, and when the feature vectors of one frame are all finished. Do until.

이어서, 도 6c에 도시된 바와 같이, 반복 과정이 모두 끝난 특징벡터는 다시 원래의 자리로 되돌아와 다음 레퍼런스와 매칭 준비를 하는 도면으로서, 모든 레퍼런스에 대해 이러한 작업이 이루어지면 각각의 매칭 코스트 값 중 가장 큰 값을 결정하고 그에 해당하는 레퍼런스가 우리가 인식하고자 하는 음성인식결과인 것이다. Next, as shown in FIG. 6C, the feature vector, which has completed the repetition process, returns to its original position and prepares for matching with the next reference. Determine the largest value and the corresponding reference is the speech recognition result we want to recognize.

도 7은 다음 프레임을 만드는 과정에 대하여 도시한 도면으로서, 한 프레임에서 도 6의 과정이 끝나면 다음 인식을 위해 다음 프레임으로 바뀌어야 하는데 이 과정은 프레임의 가장 앞에 있는 특징 벡터를 제거하고, 나머지 특징 벡터를 한 칸씩 시프트하고, 새로운 특징벡터를 맨 뒤에 붙이면 다음 프레임이 된다. 이와 같이, 다음 프레임이 되면 도 6의 과정을 반복하면서 음성인식결과를 추출하는 것이다. FIG. 7 is a diagram illustrating a process of creating a next frame. When the process of FIG. 6 is completed in one frame, the process needs to be changed to the next frame for the next recognition. This process removes the front feature vector of the frame and the rest of the feature vector. Shift by one space and attach the new feature vector to the end to the next frame. As such, when the next frame is reached, the voice recognition result is extracted while repeating the process of FIG. 6.

이상에서 설명한 바와 같이, 본 발명은 음성 인식 알고리즘으로 사용되는 HMM을 이용한 시스톨릭 배열 구조의 하드웨어 아키텍쳐로 구현시킴으로써, 컴퓨터 없이 단독적으로 음성인식을 수행할 수 있으며, 또한 사용하고자 하는 장치에 편리하게 설치 가능하며, 병렬구조의 구현으로 실시간 음성인식을 가능하게 하여 음성인식 전용 가전제품, 자동차, 장난감, 이동통신 단말기, PDA 등의 여러 분야에 응용 가능하다는 효과가 있다. As described above, the present invention is implemented by a hardware architecture of a systolic array structure using HMM used as a speech recognition algorithm, so that speech recognition can be performed independently without a computer, and also conveniently installed in a device to be used. It is possible to realize real-time voice recognition by implementing the parallel structure, and thus it is effective to be applied to various fields such as voice recognition home appliances, automobiles, toys, mobile communication terminals and PDAs.

도 1은 본 발명에 따른 HMM을 이용한 음성 인식장치에 대한 블록 구성도이고,1 is a block diagram of a speech recognition apparatus using an HMM according to the present invention;

도 2는 음성 인식 과정의 일 예를 설명하기 위한 도면이며,2 is a diagram illustrating an example of a speech recognition process.

도 3은 본 발명에 따른 HMM 알고리즘의 하드웨어 아키텍쳐의 전체 구성도이며, 3 is an overall configuration diagram of the hardware architecture of the HMM algorithm according to the present invention,

도 4는 본 발명에 따른 프로세싱 엘리먼트의 내부 구조를 도시한 도면이며,4 shows the internal structure of a processing element according to the invention,

도 5는 본 발명에 따른 HMM 모델 알고리즘을 설명하는 도면이며,5 is a diagram illustrating an HMM model algorithm according to the present invention,

도 6은 도 3에 도시된 프로세싱 엘리먼트의 데이터 흐름에 대하여 도시한 도면이며, 6 is a diagram illustrating a data flow of the processing element illustrated in FIG. 3.

도 7은 본 발명에 따른 다음 프레임을 만드는 과정에 대하여 도시한 도면이다.7 is a diagram illustrating a process of creating a next frame according to the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

10 : 샘플링 및 양자화 블록 20 : 잡음 제거부10: sampling and quantization block 20: noise canceller

30 : 특징 벡터 추출부 40 : HMM을 이용한 음성인식블록30: feature vector extractor 40: speech recognition block using HMM

S1 : 아날로그 음성 신호 S2 : 마이크S1: analog voice signal S2: microphone

Claims

In the speech recognition device using the Hidden Markov Model (HMM),

A sampling and quantization block for converting an analog speech signal into a digital speech signal,

A noise removing unit for removing ambient noise from the converted digital voice signal;

A feature vector extraction unit for extracting a feature of the noise-free digital voice signal in the form of a vector;

Speech recognition block using HMM that outputs speech recognition result through architecture HMM algorithm implemented by hardware with probability parameter determined through learning about the extracted feature vector

Speech recognition apparatus using a HMM, characterized in that it comprises a.

The method of claim 1,

The speech recognition block using the HMM,

A memory for storing transition probabilities and output probabilities obtained through learning in advance,

A register for temporarily storing a probability value stored in the memory;

A processing element which is composed of a plurality of elements, each of which is connected to an element below one level and two levels below, and calculates a probability value provided from the register;

Matching system for finally matching and determining the calculated probability value

Speech recognition apparatus using HMM characterized in that it comprises a.

The method of claim 2,

The internal structure of the processing element,

A first adder each adding a state transition probability to a cost value calculated at a previous time;

The added result ( , , Comparator that selects and outputs the minimum value

State observed probability to the output minimum Second adder to add

Speech recognition apparatus using a HMM characterized in that it comprises a.

The method of claim 3, wherein

Cost value of the current time t added by the second adder ) Is a speech recognition device using an HMM, characterized in that it enters its own processing element and the input of the two processing elements for the calculation of the next time.

In the speech recognition result extraction method of the processing element,

Storing the feature vectors t ₀ , t ₁ , ..., t _N-1 corresponding to one frame in memory;

Serially updating certain parameters of the probability of transition to the processing elements via registers on the left and right sides,

After the parameter is updated, calculating a matching cost value using the updated value;

Extracting a speech recognition result by repeatedly performing the storing and updating process while shifting the feature vector one by one until the feature vector corresponding to the one frame is finished.

Speech recognition method using HMM, characterized in that it comprises a.

The method of claim 5, wherein

When the feature vectors corresponding to the one frame are all finished, the first feature vector of the frame is removed, the remaining feature vectors are shifted by one space, and the new feature vectors are attached to the rear, and the voice recognition is performed on the feature vectors corresponding to the next frame. Speech recognition method using HMM characterized in that to extract the result.