KR950001068B1

KR950001068B1 - Speech signal processing device

Info

Publication number: KR950001068B1
Application number: KR1019940023426A
Authority: KR
Inventors: 죠지 카네; 아끼라 노하라
Original assignee: 마쯔시다덴기산교 가부시기가이샤; 다니이 아끼오
Priority date: 1990-01-18
Filing date: 1994-09-15
Publication date: 1995-02-08
Anticipated expiration: 2011-01-18

Abstract

내용없음.None.

Description

Voice signal processing device

제1도와 제2도는 종래예의 음성신호처리장치의 블록도.1 and 2 are block diagrams of a conventional audio signal processing apparatus.

제3도는 본 발명의 제1실시예의 블록도.3 is a block diagram of a first embodiment of the present invention.

제4도는 제1실시예의 동작을 설명하기 위한 켑스트럼 특성도.4 is a cepstrum characteristic diagram for explaining the operation of the first embodiment.

제5도는 본 발명의 제2실시에에 의한 음성신호처리장치의 블록도.5 is a block diagram of an audio signal processing apparatus according to a second embodiment of the present invention.

제6도는 본 발명의 제3실시예에 의한 음성신호처리장치의 블록도.6 is a block diagram of an audio signal processing apparatus according to a third embodiment of the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

81, 101, 208 : 켑스트럼산출부 82, 102, 209 : 피크검출부81, 101, 208: Cepstrum calculation part 82, 102, 209: Peak detection part

83 : 음성판별부 84 : 분석구간설정부83: voice discrimination unit 84: analysis section setting unit

85, 211 : 분석구간분류부 86, 212 : 분석구간메모리85, 211: Analysis section classification section 86, 212: Analysis section memory

103 : 제어부 104 : 피크치메모리103: control unit 104: peak value memory

105, 213 : 음성분석부 106, 214 : 음성검출부105, 213: voice analyzer 106, 214: voice detector

107, 215 : 매칭부 210 : 분석구간처리부107, 215: matching section 210: analysis section processing section

본 발명은 음성처리에 사용할 수 있는 음성신호처리장치에 관한 것이다.The present invention relates to a speech signal processing apparatus that can be used for speech processing.

최근 음성인식, 화자인식, 음성에 의한 기기의 조작, 컴퓨터에의 음성에 의한 입력등의 용도에 음성의 유무를 검출하는 음성신호처리장치가 여러방면으로 이용되고 있다.Background Art In recent years, voice signal processing apparatuses that detect the presence or absence of voice for applications such as voice recognition, speaker recognition, device operation by voice, and voice input to a computer have been used in various fields.

제1도는 종래의 음성신호처리장치를 표시하는 블록도로서, 도면에 의해서 이하 그 구성과 동작을 설명하면, 켑스트럼검출부(7)는 음성입력으로부터 켑스트럼을 검출해서 피크검출부에 공급한다. 피크검출부(8)는 켑스트럼으로부터 피크를 검출해서 음성판별부(9)에 공급한다. 음성판별부(9)에서는 켑스트럼의 피크를 한계치와 비교해서 음성의 유무를 판별해서 음성검출신호를 출력한다.FIG. 1 is a block diagram showing a conventional audio signal processing apparatus. The structure and operation of the conventional audio signal processing apparatus will be described below with reference to the drawings. The chop stratum detector 7 detects the chop strum from a voice input and supplies it to the peak detector. . The peak detection section 8 detects the peak from the shock spectrum and supplies it to the audio discriminating section 9. The speech discriminating section 9 compares the peak of the spectrum with a threshold to determine the presence or absence of speech and outputs a speech detection signal.

그러나, 이와 같은 종래의 신호처리장치에서는, 켑스트럼검출부로부터 얻은 켑스트럼의 피크를 구하기 위한 처리시간이 매우 길게 걸리고, 또한 음성에 잡음이 중첩된 경우에 음성을 착오로 검출하기 쉽다고 하는 문제점이 있었다.However, in such a conventional signal processing apparatus, the processing time for obtaining the peak of the chop strut obtained from the chop strum detector is very long, and it is easy to detect the erroneous speech when noise is superimposed on the vocal strata. There was this.

또한, 제2도는 종래의 신호처리장치의 블록도이다. 도면에 도시한 바와 같이, 켑스트럼산출부(16)는 음성입력의 켑스트럼을 산출하고 피크검출부(17)에 공급한다. 피크검출부(17)는 켑스트럼산출부(16)의 켑스트럼이 공급되고, 피크를 검출해서 음성검출부(19)에 공급한다. 음성검출부(19)는 피크검출부(17)의 피크신호가 공급되고, 음성의 검출결과를 매칭부(20)에 공급한다. 음성분석부(18)는 음성입력의 분석을 행하여 매칭부(20)에 공급한다. 매칭부(20)는 음성검출부(19)로부터 공급되는 제어신호에 의해 제어되어 인식출력을 행한다.2 is a block diagram of a conventional signal processing apparatus. As shown in the figure, the cepstrum calculation unit 16 calculates the cepstrum of the audio input and supplies it to the peak detection unit 17. The peak detection unit 17 is supplied with the chop strum of the chop strum calculation unit 16, detects the peak, and supplies the peak strum to the negative detection unit 19. The voice detection unit 19 is supplied with the peak signal of the peak detection unit 17 and supplies the detection result of the voice to the matching unit 20. The voice analyzer 18 analyzes the voice input and supplies it to the matching unit 20. The matching unit 20 is controlled by a control signal supplied from the voice detector 19 to perform a recognition output.

이와같이 구성된 종래의 신호처리장치에 대해서 그 동작을 설명한다. 음성입력은 켑스트럼산출부(16)에 의해 그 켑스트럼이 산출된다.The operation of the conventional signal processing device configured as described above will be described. The vocal strum is calculated by the cepstrum calculation unit 16 for the audio input.

그리고 피크검출부(17)에 의해 켑스트럼피크가 검출된다. 음성검출부(19)에서는 켑스트럼피크의 유무나 크기에 의해 음성의 유무를 판정하고, 매칭부(20)에 제어신호로서 공급한다. 한편 매칭부(20)에서 패턴매칭을 행할 수 있도록, 음성분석부(18)에 의해 음성입력을 분석하고, 분석된 음성입력을 매칭부(20)에 공급한다. 매칭부(20)에서는 음성검출부(19)로부터의 제어신호에 의해서, 음성분석부(18)에서 공급되는 신호를 기준패턴과 매칭시켜서 음성인식을 행하여 인식 출력을 얻는다. 여기서, 음성검출부(19)로부터의 제어신호는, 음성이 검출된 경우에 매칭부(20)에 매칭동작을 행하도록, 매칭부(20)를 제어한다.The peak detection section 17 detects the shock strep peak. The voice detection unit 19 determines the presence or absence of voice based on the presence or absence of the thump peak and supplies it to the matching unit 20 as a control signal. In order to perform pattern matching in the matching unit 20, the voice analyzing unit 18 analyzes the voice input and supplies the analyzed voice input to the matching unit 20. The matching section 20 matches the signal supplied from the speech analyzing section 18 with the reference pattern by the control signal from the speech detecting section 19 to perform speech recognition to obtain a recognition output. Here, the control signal from the voice detector 19 controls the matcher 20 to perform a matching operation on the matcher 20 when a voice is detected.

그러나, 이와 같은 종래의 신호처리장치에서는, 음성이 입력된 경우에, 매칭동작을 하상 행하기 때문에, 음성인식대상외의 음성입력에 대해서도 동작하여 불필요한 신호처리를 행하게 되고, 따라서, 불필요한 처리시간이 발생하는 동시에, 착오로 인식하기 쉽다고 하는 불편이 발생하고, 또는 복수의 입력신호를 구별할 수 없었다.However, in the conventional signal processing apparatus as described above, when a voice is input, the matching operation is performed on a lower level, so that it operates also for a voice input other than the voice recognition object, thereby performing unnecessary signal processing, and thus an unnecessary processing time occurs. At the same time, the inconvenience of easy recognition due to a mistake has occurred, or a plurality of input signals could not be distinguished.

본 발명의 제1목적은 켑스트럼 검출로부터 얻은 켑스트럼의 피크를 구하기 위한 처리시간이 짧은 음성신호처리장치를 제공하는데 있다.It is a first object of the present invention to provide an audio signal processing apparatus having a short processing time for obtaining peaks of a cepstrum obtained from cepstrum detection.

본 발명의 제2목적은, 켑스트럼분석방법을 이용해서 음성의 검출을 정확하게 행함으로써, 인식의 대상으로 하는 음성입력에 대해서만 동작하는 장치를 제공하는데 있다.A second object of the present invention is to provide an apparatus which operates only on a voice input to be recognized by accurately detecting a voice using a cepstrum analysis method.

본발명의 제3 목적은, 켑스트럼 분석을 정확하게 음성을 검출함으로써, 복수의 입력신호에 대해서 등록완료된 것만 인식동작을 효율적으로 행하는 장치를 제공하는데 있다.A third object of the present invention is to provide an apparatus for efficiently performing the recognition operation only for the registration of a plurality of input signals by detecting the voice accurately through spectral analysis.

상기 제1목적을 달성하기 위한 본 발명에 따른 음성신호처리장치는, 음성을 입력해서 켑스트럼을 산출하는 켑스트럼산출부와 상기 켑스트럼으로부터 지정된 분석구간에서 핑크를 검출하는 피크검출부와, 상기 피크검출출력으로부터 음성검출출력을 얻는 음성판별부와, 상기 피크검출출력으로부터 최적의 분석구간을 산출하는 동시에 상기 피크검출부에 분석구간을 지정하는 분석구간설정부와, 상기 최적의 분석구간을 기본으로 하여 분류처리한 분석구간을 기억하는 분석구간메모리와, 모드설정입력에 응답해서, 상기 분석구간설정부가 피크검출부에 지정하는 분석구간을 지정하고, 또한 상기 모드설정입력에 응답해서 상기 최적의 분석구간을 상기 분석구간메로리와 조합(照合)해서 상기 분석구간설정부에 지시하는 분석구간분류부를 구비한 것을 특징으로 한다.According to an aspect of the present invention, there is provided a speech signal processing apparatus comprising: a spectrum calculation section for inputting speech to calculate a chop stratum, and a peak detector for detecting pink in a specified analysis section from the chop stratum; A sound discrimination unit for obtaining a sound detection output from the peak detection output, an analysis section setting unit for calculating an optimal analysis section from the peak detection output and designating an analysis section at the peak detection unit, and the optimum analysis section; An analysis section memory for storing the analysis sections classified as a basis, and an analysis section designated by the analysis section setting section to the peak detection section in response to the mode setting input, and further in response to the mode setting input, An analysis section classification section for instructing the analysis section setting section by combining the analysis section with the analysis section memory It characterized.

본 발명은 상기한 구성에 의해, 켑스트럼산출부는 음성입력의 켑스트럼을 산출하여 피크검출부에 공급한다. 피크검출부는 분석구간설정부로부터 입력되는 분석구간에 따라서 켑스트럼산출부로부터 공급되는 켑스트럼의 피크를 검출한다.According to the above-described configuration, the present invention calculates the chop strum of the audio input and supplies it to the peak detector. The peak detection section detects peaks of the cepstrum supplied from the cepstrum calculation section in accordance with the analysis section inputted from the analysis section setting section.

그리고 음성판별부는 피크검출부의 신호의 일부로부터 음성의 유무를 판별해서 음성검출출력으로 한다. 여기서, 분석구간설정부의 구간설정동작과 분석구간분류부의 분류처리동작은 다음과 같이 행해진다. 먼저, 모드설정입력이 ＂등록＂모드일 경우에는, 분석구간설정부는 미리 결정된 넓은 분석구간을 피크검출부에 공급하는 동시에, 피크검출부로부터 공급되는 음성입력에 대한 켑스트럼의 피크에 따른 최적의 분석구간을 산출해서 분석구간분류부에 공급한다. 분석구간분류부는 최적분석구간의 데이터가 분석구간메모리에 기억되고 있는 분석기간의 데이터와 비교하여 다른 종류의 것일 경우에는 분석구간메모리에 추가기억한다. 다음에 모드설정입력이 ＂인식＂모드일 경우에는 분석구간설정부는 분석구간분류부에 지시되어 분석구간메모리로부터 공급되는 분석구간의 데이터나 미리 결정된 넓은 분석구간의 설정치를 피크검출부에 공급하는 동시에, 피크검출부로부터 공급되는 음성입력에 대한 켑스트럼의 피크에 따른 최적분석구간을 산출해서 분석구간분류부에 공급한다. 분석구간분류는 최적분석구간과 유사분석구간을 메모리로부터 선택해서 분석구간설정부에 공급하도록 지정한다. 여기서, 유사분사구간이란, 2개의 분석구간의 중첩하는 구간이 미리 결정된 비율보다 큰 것으로 한다.The sound discriminating section discriminates the presence or absence of sound from a part of the signal of the peak detecting section and sets it as the sound detecting output. Here, the section setting operation of the analysis section setting section and the classification processing operation of the analysis section classification section are performed as follows. First, when the mode setting input is the "registered" mode, the analysis section setting section supplies the predetermined wide analysis section to the peak detection section and at the same time, the optimum analysis according to the peak of the spectrum for the audio input supplied from the peak detection section. The interval is calculated and supplied to the analysis section classification section. The analysis section classification section stores additional data in the analysis section memory when the data of the optimal analysis section are different from those of the analyzers stored in the analysis section memory. Next, when the mode setting input is "recognition" mode, the analysis section setting section is instructed by the analysis section classification section to supply data of the analysis section supplied from the analysis section memory or a predetermined wide analysis section setting value to the peak detection section. The optimum analysis section according to the peak of the cepstrum for the voice input supplied from the peak detection section is calculated and supplied to the analysis section classification section. The analysis section classification specifies that the optimal analysis section and the similar analysis section are selected from the memory and supplied to the analysis section setting section. Here, the similar injection section means that an overlapping section of two analysis sections is larger than a predetermined ratio.

상기 제2목적을 달성하기 위한 본 발명에 따른 음성신호처리장치는, 음성입력을 분석해서 분석신호를 출력하는 음성분석부와, 상기 음성입력으로부터 켑스트럼을 산출해서 출력하는 켑스트럼산출부와, 상기 켑스트럼의 피크를 검출해서 피크신호를 출력하는 피크검출부와, 상기 피크신호로부터의 음성의 유무를 판정해서 제1제어신호를 출력하는 음성검출부와, 상기 피크신호를 기억하는 피크치메모리를 형성하고, 모드 설정입력의 ＂설정＂모드에 응답해서 상기 피크신호를 상기 메모리에 기록하고, 모드설정입력의 ＂인식＂모드에 응답해서 상기 메모리의 피크신호와 음성입력의 켑스트럼피크신호를 비교해서, 각각의 퀴프렌시의 차에 대응해서 상기 제2제어신호를 출력하는 제어부와, 제1제어신호 및 제2제어신호의 입력에 응답해서 상기 음성분석부의 분석신호를 템플레이트와 비교하여 인식출력을 행하는 매칭부를 구비한 것을 특징으로 한다.An audio signal processing apparatus according to the present invention for achieving the second object includes a voice analyzer for analyzing a voice input and outputting an analysis signal, and a chord strut calculation unit for calculating and outputting a chord strum from the voice input. A peak detector for detecting peaks of the pulses and outputting peak signals, a voice detector for determining the presence or absence of audio from the peak signals and outputting a first control signal, and a peak value memory for storing the peak signals; And the peak signal is written to the memory in response to the "setting" mode of the mode setting input, and the peak signal of the memory and the spectrum peak signal of the audio input in response to the "recognition" mode of the mode setting input. And a control unit for outputting the second control signal in response to the difference between the respective Qui-French and the voice component in response to the input of the first control signal and the second control signal. And a matching section for performing recognition output by comparing the analysis signal of the stone section with the template.

본 발명은 상기한 구성에 의해, 음성입력은 켑스트럼산출부와 피크검출부를 거쳐 켑스트럼의 피크가 검출된다. 그리고 음성검출부에서는 켑스트럼의 피크에 의거하여 음성의 유무의 판별을 행하고, 매칭부에 음성의 유무에 대응한 제1제어신호를 공급한다. 또 제어부에 있어서는 모드설정입력이 ＂등록＂모드일 경우는 피크검출부에서 얻은 켑스트럼의 피크신호를 피크치메모리에 기억하고, 또 모드설정입력이 ＂인식＂모드일 경우에는 피크검출부에서 얻은 켑스트럼의 피크신호와, 피크치메모리에 기억된 피크친신호를 비교하여, 각각의 퀴프렌시의 차에 따라서 매칭부에 제2제어신호를 공급한다. 또 음성입력은, 매칭부에서 사용할 수 있도록 음성분석부에서 분석되고, 매칭부에 의해 미리 등록되어 있는 데이터와 매칭처리를 행하여 인식출력을 얻는다. 이때 매칭처리의 동작의 개시는, 음성검출부의 제1제어신호 및 제어부의 제2제어신호에 의해서 제어된다.According to the present invention, in the voice input, the peak of the chop strum is detected through the chop strum calculation unit and the peak detecting unit. The voice detection unit judges the presence or absence of the voice on the basis of the peak of the beat strum, and supplies the first control signal corresponding to the presence or absence of the voice to the matching unit. The control section stores the peak signal of the spectrum obtained from the peak detection section when the mode setting input is in the "registered" mode, and stores the peak signal obtained by the peak detection section when the mode setting input is in the "recognition mode". The peak signal of the rum is compared with the peak parent signal stored in the peak value memory, and the second control signal is supplied to the matching unit in accordance with the difference of the respective Quirentices. In addition, the voice input is analyzed by the voice analyzer so as to be used by the matching unit, and is subjected to matching processing with data previously registered by the matching unit to obtain a recognition output. At this time, the start of the operation of the matching process is controlled by the first control signal of the voice detector and the second control signal of the controller.

즉, 음성검출부로부터의 제1제어신호는, 음성이 검출된 경우에 매칭의 동작을 개시하도록 하고, 또한 제어부로부터의 제2제어신호는, 모드설정입력이 ＂인식＂모드일 경우에는 음성입력의 켑스트럼피크신호의 퀴프렌시값과 모드설정입력이 ＂설정＂모드일 때에 메모리에 미리 등록된 피크신호의 퀴프렌시값 사이에 차이가 없다고 판정되며 매칭의 동작을 개시한다.That is, the first control signal from the voice detection unit causes the matching operation to start when a voice is detected, and the second control signal from the control unit controls the voice input when the mode setting input is "recognition" mode. It is determined that there is no difference between the quiescent value of the chopped peak signal and the quiescent value of the peak signal registered in advance in the memory when the mode setting input is in the set mode, and the matching operation starts.

상기 제3목적을 달성하기 위한 본 발명에 따른 음성신호처리장치는, 음성입력을 분석해서 분석신호를 출력하는 음성분석부와, 음성신호로부터 켑스트럼을 산출하여 출력하는 켑스트럼산출부와 지정된 구간에서 켑스트럼의 피크를 검출해서 출력하는 피크검출부와, 상기 피크검출부의 출력으로부터 음성신호의 유무에 대응하는 상기 제1제어신호를 출력하는 음성검출부와, 분석구간을 상기 피크검출부에 지정해서 출력하는 동시에 상기 켑스트럼피크에 대응하는 최적분석구간을 산출해서 출력하는 분석구간처리부와, 상기 최적분석구간을 기본으로 하여 분류처리한 분석구간을 기억하는 분석구간 메모리와, 모드설정입력에 응답해서, 상기 분석구간처리부가 피크검출부에 지정하는 분석구간을 지정하고, 또한 상기 모드설정입력에 응답해서 상기 최적분석구간을 상기 구간메모리의 분석구간 데이터와 조합해서 음성신호의 인식대상에 대응하는 제2제어신호를 출력하는 동시에, 상기 분석구간메모리의 분석구간데이터의 분류처리와 상기 분석구간처리부에 분석구간을 지정하는 분석구간분류부와, 상기 제1제어신호 및 제2제어신호의 입력에 응답해서 상기 음성분석부의 분석신호를 템플레이트와 비교해서 인식출력을 행하는 매칭부를 구비한 것을 특징으로 한다.According to an aspect of the present invention, there is provided a voice signal processing apparatus comprising: a voice analyzer for analyzing a voice input and outputting an analysis signal; Designate a peak detector for detecting and outputting peaks in a specified section, a voice detector for outputting the first control signal corresponding to the presence or absence of a voice signal from the output of the peak detector, and an analysis section. And an analysis section processing section for calculating and outputting an optimum analysis section corresponding to the shock strum peak, an analysis section memory for storing analysis sections classified and processed based on the optimum analysis section, and a mode setting input. In response, the analysis section processing section designates an analysis section designated by the peak detection section, and in response to the mode setting input, Combining the analysis section with the analysis section data of the section memory and outputting a second control signal corresponding to the object to be recognized of the voice signal, the classification section of the analysis section data of the analysis section memory and the analysis section processing section And a matching section for performing a recognition output by comparing the analysis signal of the speech analysis section with a template in response to the input of the first control signal and the second control signal.

본 발명은 상기 구성에 의해, 음성입력신호는 켑스트럼산출부와 피크검출부의 과정에 있어서 분석구간처리부가 지정하는 분석구간에서 켑스트럼의 피크가 검출된다. 음성검출부에서는 상기 켑스트럼의 피크에 의거하여 음성의 유무를 판별해서 매칭부에 제1제어신호를 공급한다. 이때 피크검출부에 부여되는 분석구간은 모드설정입력의 모드에 의해 하기와 같이 한다. 먼저 모드설정입력이 ＂등록＂모드일 경우, 분석구간처리부는 피크검출부에 미리 정한 분석구간을 공급하는 동시에 켑스트럼피크에 대응한 최적분석구간을 산출해서 분석구간분류부에 출력한다. 분석구간분류부는 하기와 같이 분류처리를 행한다. 즉, 분석구간분류부는 상기 최적분석구간을 분석구간메모리와 비교해서, 메모리의 구간데이터에서 상기 최적분석구간을 미리 결정한 비율이상으로 중복해서 포함하는 분석구간(이것을 유사한 분석구간으로 정의한다)이 있는 경우에는 그 유사분석구간을 분석구간처리부를 행해서 피크검출부에 공급하는 동시에, 하기와 같이 합성한 분석구간을 상기 메모리의 분석구간으로 치환해서 기억하고, 상기 유사한 분석구간이 없는 경우에는 최적분석구간을 분석구간메모리에 기록한다. 상기 합성된 분석구간은 상기 최적분석구간과 메모리데이타가 부여하는 분석구간의 중첩부분을 포함하고, 그 하한과 상한이 상기 어느하나의 분석구간에 있는 것으로 한다. 다음에 모드설정입력이 ＂인식＂모드일 경우, 분석구간처리부는 미리 정한 분석구간을 피크검출부에 공급하는 동시에 그 피크에 대응한 최적분석구간을 산출해서 구간분류부에 출력한다.According to the present invention, the peak of the cepstrum is detected in the analysis section designated by the analysis section processing section in the course of the cepstrum calculation section and the peak detection section. The voice detection unit determines the presence or absence of voice on the basis of the peak of the cepstruum and supplies the first control signal to the matching unit. At this time, the analysis section given to the peak detector is as follows by the mode of the mode setting input. First, when the mode setting input is "Registration" mode, the analysis section processing section supplies a predetermined analysis section to the peak detection section, and calculates an optimal analysis section corresponding to the chop strum peak and outputs it to the analysis section classification section. The analysis section classification section performs a classification process as follows. That is, the analysis section classification section compares the optimal analysis section with the analysis section memory, and includes an analysis section (which is defined as a similar analysis section) that overlaps the optimal analysis section in the interval data of the memory more than a predetermined ratio. In this case, the similar analysis section is supplied to the peak detection section by performing the analysis section processing section, and the analysis section synthesized as follows is replaced with the analysis section of the memory, and when there is no similar analysis section, the optimal analysis section is found. Record in the analysis section memory. The synthesized analysis section includes an overlapping portion of the analysis section provided by the optimal analysis section and the memory data, and the lower limit and the upper limit are in any one of the analysis sections. Next, when the mode setting input is "recognition" mode, the analysis section processing section supplies a predetermined analysis section to the peak detection section, calculates an optimum analysis section corresponding to the peak, and outputs the optimum analysis section to the section classification section.

분석구간분류부는 상기 최적분석구간을 분석구간메모리와 비교한다. 이때 상기 최적분석구간은 상기 유사한 분석구간이 메모리에 있을 경우에는, 메모리의 분석구간을 분석구간처리부를 통과하여 피크검출부에 부여하는 동시에 인식대상에 대응하는 제2제어신호를 출력하고, 상기 유사한 분석구간이 없는 경우에는 피크검출부의 분석구간은 미리 정해진 분석구간이 그대로 유지된다.The analysis section classification unit compares the optimum analysis section with the analysis section memory. In this case, when the similar analysis section is in the memory, the analysis section of the memory passes the analysis section processing section to the peak detection section through the analysis section processing section, and outputs a second control signal corresponding to the recognition object, and the similar analysis section. If there is no section, the analysis section of the peak detector is maintained as it is.

한편, 음성입력은 음성분석부에 매칭부에서의 분석처리에 대응한 분석이 이루어지고, 매칭부에서 미리 등록된 데이터와 매칭처리해서 인식출력을 얻는다. 이때 매칭처리부는 상기 제1제어신호 및 제2제어신호에 각각 음성신호가 있을때와 인식대상에 대응할 때만 실행되도록 제어된다.On the other hand, the voice input is analyzed corresponding to the analysis processing in the matching unit in the voice analyzing unit, and is matched with data registered in advance in the matching unit to obtain the recognition output. At this time, the matching processing unit is controlled to be executed only when the first control signal and the second control signal respectively have a voice signal and correspond to the recognition target.

이하 본 발명의 실시예에 대해서, 도면을 참조하면서 설명한다.EMBODIMENT OF THE INVENTION Hereinafter, the Example of this invention is described, referring drawings.

제3도는 본 발명의 제1실시예에 의한 음성신호처리장치의 블록도를 도시한다. 도면에 도시한 바와 같이 켑스트럼산출부(81)는 음성입력의 켑스트럼을 산출해서 피크검출부(82)에 공급하고, 피크검출부(82)는 켑스트럼의 피크를 분석구간설정부(84)로부터 입력되는 분석구간에서 검출해서 음성판별부(83) 및 분석구간설정부(84)에 공급한다. 음성판별부(83)는 피크검출부(82)로부터 공급된 켑스트럼의 피크로부터 음성의 유무를 판별하여 음성검출출력을 얻는다. 분석구간설정부(84)는 피크검출부(82)로부터 공급된 켑스트럼의 피크에 따른 최적분석구간을 산출해서 분석구간분류부(85)에 공급하는 동시에, 모드설정입력에 응답하여,분석구간 분류부의 지시에 의해서 분석구간메모리(86)로부터 공급되는 분석구간데이터 또는 미리 결정된 분석구간데이터를 피크검출부(82)에 공급한다. 분석구간분류부(85)는, 상기 최적분석구간데이터와 분석구간메모리(86)에 기억된 분석구간데이터를 비교해서 분류처리를 행하고, 모드설정입력에 따라서 분석구간메모리(86)에 데이터를 기억시키거나 분석구간메모리(86)를 판독해서 분석구간을 제어한다.3 shows a block diagram of the audio signal processing apparatus according to the first embodiment of the present invention. As shown in the figure, the cepstrum calculation unit 81 calculates the cepstrum of the audio input and supplies it to the peak detector 82, and the peak detector 82 transmits the peak of the cepstruum peak to an analysis section setting unit ( It detects in the analysis section input from 84), and supplies it to the audio | voice discrimination part 83 and the analysis section setting part 84. The sound discriminating unit 83 determines the presence or absence of sound from the peak of the cepstrum supplied from the peak detecting unit 82, and obtains the sound detecting output. The analysis section setting section 84 calculates the optimum analysis section according to the peaks of the cepstrum supplied from the peak detection section 82 and supplies it to the analysis section classification section 85, and responds to the mode setting input. The analysis section data or the predetermined analysis section data supplied from the analysis section memory 86 are supplied to the peak detector 82 by the classification section. The analysis section classification unit 85 compares the optimum analysis section data with the analysis section data stored in the analysis section memory 86 and performs classification processing, and stores the data in the analysis section memory 86 according to the mode setting input. Or the analysis section memory 86 is read to control the analysis section.

상기 구성에 대한 동작을 설명한다.The operation of the above configuration will be described.

음성입력은 켑스트럼산출부(81)에서 켑스트럼을 계산하고, 다음에 피크검출부(82)에서 켑스트럼의 피크가 검출되고, 음성판별부(83)에서 음성의 유무를 판별해서, 음성검출출력으로서 출력된다. 여기서 피크검출부(82)는 분석구간설정부(84)로부터 공급되는 분석구간에 따라서 켑스트럼의 피크를 구하는 퀴프렌시값을 정해서 피크검출을 행하도록 동작한다. 다음에 분석구간설정부(84), 분석구간분류부(85), 분석구간메모리(86)의 동작을 제4도를 참조하면서 설명한다.The voice input calculates the chop stratum in the chop stratum calculation unit 81, then detects the chop strum peak in the peak detector 82, determines the presence or absence of the voice in the voice discriminating unit 83, It is output as a voice detection output. Here, the peak detector 82 operates to perform peak detection by setting a Qui-French value for determining the peak of the cepstrum in accordance with the analysis section supplied from the analysis section setting section 84. Next, the operation of the analysis section setting section 84, analysis section classification section 85, and analysis section memory 86 will be described with reference to FIG.

제4도는 켑스트럼산출부(81)에서 구한 켑스트럼을 도시한 것으로서, 세로축은 켑스트럼의 레벨, 가로축은 켑스트럼에 대응한다. P₁및 P₂는 각각 피크검출부(82)에서 구한 켑스트럼피크의 퀴프렌시값을 표시하고, 구간(a₀-b₀),(a₂-b₂),(a₃-b₃)는 각각 분석구간설정부(84), 분석구간메모리(86), 분석구간분류부(85)가 추력하는 분석구간을 표시한다. 먼저, 모드설정입력이 ＂등록＂모드인 경우, 분석구간설정부(84)는 피크검출부(82)에서의 피크검출의 분석구간으로서 가장 넓은 ＂a₀-b₀＂를 부여하고, 음성입력에 따라서 도면의 실선으로 표시한 퀴프렌시 (P₁)에 피크를 가진 켑스트럼이 피크검출부(82)로부터 얻어지는 것으로 한다. 분석구간설정부(84)는 퀴프렌시(P₁)에 대해서 분석구간(a₀-b₀)보다 좁은 최적분석구간(a₃-b₃)을 산출해서 분석구간분류부(85)에 공급한다. 분석구간분류부(85)에서는 최적분석구간을 분석구간메모리(86)의 분석구간데이터와 비교하여 최적분석구간을 미리 결정한 비율이상 포함한 분석구간(유사분석구간으로 정의한다)이 없을 경우에는 최적분석구간(a₃-b₃)을 분석구간메모리(86)에 기억하고, 상기 유사분석구간이 있을 경우에는 하기의 합성한 분석구간을 상기 유사분석구간과 치환해서 기억한다. 상기 합성한 분석구간은 상기 최적분석구간과 메모리의 분석구간의 중첩구간을 포함하여 그 하한과 상한이 상기 어느하나의 분석구간이 포함되는 분석구간으로 한다.4 shows the cepstrum obtained by the cepstrum calculation unit 81, where the vertical axis corresponds to the level of the cepstrum and the horizontal axis corresponds to the cepstrum. P ₁ and P ₂ respectively represent the quiescent value of the cepstrum peak obtained by the peak detector 82, and the intervals (a ₀ -b ₀ ), (a ₂ -b ₂ ), (a ₃ -b ₃ Indicates an analysis section thrust by the analysis section setting section 84, analysis section memory 86, and analysis section classification section 85, respectively. First, when the mode setting input is the "registration" mode, the analysis section setting section 84 gives the widest? A ₀ -b ₀ ms as the analysis section of the peak detection in the peak detection section 82, Therefore, it is assumed that a cepstrum having a peak in the Quiprancy P ₁ indicated by the solid line in the drawing is obtained from the peak detection unit 82. The analysis section setting unit 84 calculates an optimal analysis section a _3- b _{3 that} is narrower than the analysis section a _0- b ₀ with respect to the Qui freshness P ₁ and supplies it to the analysis section classification unit 85. do. The analysis section classification unit 85 compares the optimal analysis section with the analysis section data of the analysis section memory 86, and then, if there is no analysis section (defined as a similar analysis section) that includes more than a predetermined ratio, the optimal analysis section is analyzed. The sections a _3- b ₃ are stored in the analysis section memory 86, and when the similar analysis section exists, the following synthesized analysis section is replaced with the similar analysis section and stored. The synthesized analysis section includes an overlap section between the analysis section of the optimal analysis section and the memory, and the lower and upper limits thereof are the analysis section including any one of the analysis sections.

다음에 분석구간(a₃-b₃)이 메모리에 기억된 상태에서, 모드설정이 ＂인식＂모드로 되었을 경우, 분석구간설정부(84)는 미리 결정한 분석구간(a₀-b₀) 또는 그보다 넓은 메모리의 분석구간을 피크검출부(82)에 부여한다.Next, in the state where the analysis section (a _3- b ₃ ) is stored in the memory, and the mode setting is the "recognition" mode, the analysis section setting section 84 or the predetermined analysis section (a ₀ -b ₀ ) or The analysis section of the wider memory is given to the peak detector 82.

이제 제4도의 점선으로 표시한 바와 같이 음성입력에 따라서 퀴프렌시(P₁)에 피크를 가진 켑스트럼이 피크검출부(82)로부터 얻은 것으로 하면, ＂P₁＂에 따라서 분석구간설정부(84)는 분석구간(a₃-b₃)을 산출하고, 분석구간분류부(85)는 분석구간메모리(86)로부터 분석구간(a₃-b₃)에 상기 유사한 분류구간의 존재를 조사하여, 이 경우 존재하고 있으므로 피크검출부(82)에는 분석구간(a₃-b₃)이 메모리(86)로부터 공급된다. 이때 피크검출부(82)에서의 피크검출은 분석구간이 피크치부근에 한정되므로, 피크검출처리를 고속으로 행할 수 있다. 또, 퀴프렌시(P₂)에 피크를 가진 음성입력이 있을 경우에는 분석구간설정부(84)에서 최적분석구간(a₂-b₂)을 산출하고, 분석구간분류부(85)에서 최적분석구간에 유사한 것을 조사하여, 이 경우에는 존재하지 않으므로 피크검출부(82)에 공급되는 분석구간은 ＂a₀-b₀＂의 그대로가 된다.Assuming now the cepstrum with a peak extractor friendly when (P ₁₎ according to the voice input as indicated by a separate fourth dotted line was obtained from the peak detector 82, and thus analysis interval setting unit to "P _1" ( 84) calculates an analysis section (a _3- b ₃ ), and the analysis section classification unit 85 examines the existence of the similar classification section from the analysis section memory 86 to the analysis section (a _3- b ₃ ) In this case, the analysis section a _3- b ₃ is supplied from the memory 86 to the peak detector 82. At this time, the peak detection in the peak detection unit 82 is limited to the vicinity of the peak value, so that the peak detection process can be performed at high speed. In addition, when there is a voice input having a peak in Quiprancy (P ₂ ), the analysis section setting section 84 calculates the optimum analysis section (a _2- b ₂ ), and the analysis section classification section (85) A similar thing is examined in the analysis section, and in this case, since it does not exist, the analysis section supplied to the peak detection section 82 remains the same as that of? A ₀ -b ₀ .

이와 같이 본 발명의 제1실시예의 음성신호처리장치에 의하면, 등록시에 복수의 사람의 음성으로 한 분석구간이 그룹 또는 단독으로 분류되어 설정되므로, 인식시의 피크검출의 분석구간을 한정해서 설정할 수 있다. 이에 의해 음성판별의 처리의 고속화를 행할 수 있고, 또한 분석구간을 분류하여 한정하고 있으므로, 켑스트럼의 피크검출시에 잡음에 대해서 유효하게 동작하게 되어, 정확한 음성판별을 행할 수 있다.As described above, according to the audio signal processing apparatus of the first embodiment of the present invention, since the analysis sections made up of the voices of a plurality of people at the time of registration are classified and set as groups or alone, the analysis section for peak detection at the time of recognition can be limited. have. As a result, the speed of speech discrimination processing can be speeded up, and the analysis section is classified and limited, so that the noise can be effectively operated at the time of peak detection of the cepstrum, and accurate speech discrimination can be performed.

이상의 실시예로부터 명백한 바와 같이 제1실시예에 의한 음성신호처리장치는, 피크검출부의 피크출력에 따른 최적분석구간을 산출하는 동시에 모드설정입력에 따라서 분석구간을 피크검출부에 부여하는 분석구간설정부와, 분석구간설정부가 산출하는 최적분석구간과 분석구간메모리에 기억된 분석구간을 분류해서 기억시키는 분석구간분류부를 구비한 구성으로 하고, 등록시에는 단수에 한정되지 않고 복수인의 음성을 분류하고, 각각의 그룹 또는 단독으로 켑스트럼피크의 분석구간을 설정하므로, 인식시의 켑스트럼검출의 분석구간을 한정해서 처리의 고속화를 행할 수 있는 효과가 있다. 또 분석구간을 개인 또는 그룹으로 분류하고 있으므로, 켑스트럼피크 검출시에 있어서는 잡음이 존재할 경우에도 극히 양호하게 음성검출동작을 하게 되어, 정확한 음성판별을 행할 수 있다는 우수한 효과가 있다.As is apparent from the above embodiment, the audio signal processing apparatus according to the first embodiment calculates an optimum analysis section in accordance with the peak output of the peak detection section and gives an analysis section according to the mode setting input to the peak detection section. And an analysis section classifying section for classifying and storing the optimum analysis section calculated by the analysis section setting section and the analysis section stored in the analysis section memory. Since the analysis section of the cepstrum peak is set in each group or alone, there is an effect that the processing speed can be speeded up by limiting the analysis section of the cepstrum detection at the time of recognition. In addition, since the analysis section is classified into individuals or groups, the noise detection operation can be performed extremely well even in the presence of noise at the time of peak peak detection, so that an accurate speech discrimination can be performed.

이하 본 발명의 제2실시예에 대해서 제5도를 참조하면서 설명한다.A second embodiment of the present invention will be described below with reference to FIG.

제5도는 본 발명의 실시예에 의한 음성신호처리장치의 블럭도이다. 도면에 있어서, 켑스트럼산출부(101)는 음성입력으로부터 켑스트럼을 산출하여 피크검출부(102)에 공급한다. 피크검출부(102)는 켑스트럽으로부터 피크를 검출해서 제어부(103)와 음성검출부(106)에 각각 공급한다. 음성검출부(106)는 피크검출부(102)로부터 공급되는 켑스트럼피크신호의 유무에 따라 음성의 유무의 검출을 행하고, 매칭부(107)에 제1제어신호를 공급한다. 제어부(103)는, 피크검출부(102)로부터 공급되는 켑스트럼피크신호를 모드설정입력에 따라서 피크치 메모리(104)에 공급하거나 피크치 메모리(104)로부터 공급받은 데이터를 사용해서 매칭부(107)에 제2제어신호를 출력한다. 피크치 메모리(104)는 피크검출부(102)의 켑스트럼피크신호를 기억하는 것으로서, 제어부(103)를 통하여 데이터의 기억, 판독을 행한다. 음성분석부(105)는 음성입력을 매칭부(107)에서 사용하는 데이터형식으로 분석하여 매칭부(107)에 분석신호를 공급한다. 매칭부(107)는 음성분석부(105)에서 분석된 분석신호와, 음성검출부(106)의 제1제어신호 및 제어부(103)의 제2제어신호가 공급되고, 제1, 제2제어신호에 응답하여 음성분석부(105)로부터 공급되는 분석된 분석신호를 템플레이트와 조합해서 인식출력을 얻는 구성으로 되어 있다.5 is a block diagram of an audio signal processing apparatus according to an embodiment of the present invention. In the figure, the cepstrum calculation unit 101 calculates the cepstrum from the audio input and supplies it to the peak detection unit 102. The peak detection unit 102 detects the peak from the shock strip and supplies it to the control unit 103 and the audio detection unit 106, respectively. The voice detection unit 106 detects the presence or absence of voice in accordance with the presence or absence of a chop strep peak signal supplied from the peak detection unit 102 and supplies the first control signal to the matching unit 107. The control unit 103 supplies the shock peak signal supplied from the peak detection unit 102 to the peak value memory 104 according to the mode setting input or uses the data supplied from the peak value memory 104 to match the matching unit 107. Outputs a second control signal. The peak value memory 104 stores the peak strum peak signal of the peak detection unit 102, and stores and reads data through the control unit 103. The voice analyzer 105 analyzes the voice input in a data format used by the matching unit 107 and supplies an analysis signal to the matching unit 107. The matching unit 107 is supplied with the analysis signal analyzed by the voice analyzer 105, the first control signal of the voice detector 106 and the second control signal of the controller 103, and the first and second control signals. In response to this, the analyzed analysis signal supplied from the speech analyzer 105 is combined with a template to obtain a recognition output.

상기 구성에 대한 동작을 설명한다. 먼저 모드설정입력이 ＂등록＂모드일 경우에는 켑스트럼산출부(101)에서 음성입력의 켑스트럼이 산출되고, 그리고 피크검출부(102)에서 켑스트럼피크가 검출되어 제어부(103)에 공급되고, 제어부(103)를 통하여 피크치 메모리(104)에 기억된다. 그리고 제어부(103)는 매칭부(107)에 매칭처리를 행하지 않기 위한 제2제어신호를 보낸다. 다음에 모드설정입력이 ＂인식＂모드일 경우는, 마찬가지로 켑스트럼산출부(101)에서 음성입력의 켑스트럼이 산출되고, 피크검출부(102)에서 켑스트럼피크가 검출된다. 그리고 피크검출부(102)의 켑스트럼피크신호의 유무에 따라서 음성검출부(106)에서 음성의 유무가 판별되고, 음성일 경우에는 매칭처리를 행하기 위한 제1제어신호를 보내고, 음성이 아닐 경우에는 매칭처리를 행하지 않기 위한 제2제어신호를 매칭부(107)에 보낸다. 동시에 피크검출부(102)의 켑스트럼피크신호는 제어부(103)에 의해서 미리 기억된 피크치 메모리(104)의 내용과 비교되고, 양자의 퀴프렌시값이 근접하고 있을 겨우에는 매칭부(107)에 매칭처리를 행하기 위한 제2제어신호를 보내고, 또 양자의 퀴프렌시값이 근접하고 있지 않을 경우에는, 매칭부(107)에 매칭처리를 행하지 않기 위한 제2제어신호를 보낸다.The operation of the above configuration will be described. First, when the mode setting input is the 'register' mode, the 'string' of the voice input is calculated by the 'string generator' 101, and the 'string' peak is detected by the peak detector 102, and the controller 103 is transmitted to the controller 103. It is supplied and stored in the peak memory 104 via the control unit 103. The control unit 103 then sends a second control signal to the matching unit 107 not to perform the matching process. Next, when the mode setting input is the "recognition" mode, the chop strum of the audio input is similarly calculated by the chop stratum calculation unit 101, and the chop strum peak is detected by the peak detector 102. Then, the presence or absence of the voice is determined by the voice detection unit 106 according to the presence or absence of the spectrum peak signal of the peak detection unit 102, and in the case of the voice, the first control signal for performing the matching process is sent. Next, a second control signal for not performing a matching process is sent to the matching unit 107. At the same time, the peak spectral peak signal of the peak detection unit 102 is compared with the contents of the peak value memory 104 stored in advance by the control unit 103, and the matching unit 107 is provided only when both of the quasi-french values are close to each other. The second control signal for performing the matching process is sent to the second control signal, and if the Quirent value of both is not close, the second control signal for not performing the matching process is sent to the matching part 107.

그리고 매칭부(107)에서는 음성분석부(105)에서 분석된 분석신호를, 음성검출부(106) 및 제어부(103)로부터 각각 공급되는 제1 및 제2제어신호가 어느쪽이든 매칭처리를 행하는 샹태의 신호일때에는 템플레이트의 데이터와 비교하여 인식처리동작을 행하고, 그 결과를 인식출력으로서 출력한다.The matching unit 107 performs a matching process on the analysis signal analyzed by the voice analysis unit 105 to match the first and second control signals supplied from the voice detection unit 106 and the control unit 103, respectively. In the case of a signal, a recognition processing operation is performed in comparison with the data of the template, and the result is output as a recognition output.

이와같이 이 발명의 실시예의 음성신호처리장치에 의하면, 음성입력의 켑스트럼피크의 퀴프렌시(즉, 화자의 피치주파수)가 미리 등록한 것에 가까울 경우에만 템플레이트와의 매칭처리를 행하므로, 등록한 화자이외의 음성입력일때는 매칭처리를 행하지 않아, 매칭부의 매칭처리에 소요되는 처리시간을 삭감할 수 있고, 등록화자 이외의 음성입력이 입력되었을 때에는 즉시 거절결과가 출력된다.Thus, according to the speech signal processing apparatus of the embodiment of the present invention, since the matching processing with the template is performed only when the quench peak (ie, the speaker's pitch frequency) of the voice input of the voice input is close to that registered in advance, the registered speaker In the case of voice inputs other than this, the matching process is not performed, and the processing time required for the matching process of the matching unit can be reduced. When the voice input other than the registered speaker is input, the rejection result is immediately output.

또한, 음성신호처리장치를 마이크로프로세서등으로 구성할 경우, 매칭처리 프로세서를 최소한으로 억제할 수 있으므로, CPU의 부하를 경감할 수 있어, 그 만큼 다른 처리프로세스에 충당할 수 있게 된다.In addition, when the audio signal processing apparatus is constituted by a microprocessor or the like, the matching processing processor can be suppressed to a minimum, so that the load on the CPU can be reduced, so that it can cover other processing processes.

또한 인식출력으로서 등록화자와 다르다는 결과를 내는 것은 제어부(103)의 제어신호를 사용하면 용이하게 행할 수 있음은 물론이다.In addition, it is a matter of course that the result of being different from the registered speaker as the recognition output can be easily performed by using the control signal of the control unit 103.

이상의 실시예로부터 명백한 바와같이, 본 발명의 제2실시예에 의한 음성신호처리장치는, 모드설정입력에 따라서 켑스트럼피크검출부의 피크신호출력을 피크치메모리에 기억하거나 피크검출부의 피크신호출력을 피크치메모리와 비교해서 매칭부에 제2제어신호를 공급하는 제어부를 구비한 구성에 의해서 음성입력의 피치주파수가 미리 등록한 것에 가까울 경우에만 매칭동작을 행하도록 할 수 있으므로, 등록한 화자 이외의 음성이 입력되었을 때에는 매칭처리를 행하지 않아 그만큼 프로세스를 생략할 수 있는 동시에, 고속으로 거절결과를 얻을 수 있는 효과가 얻어진다. 또 장치를 마이크로프로세서등으로 구성한 경우의 매칭 처리프로세서를 최소한으로 억제하는 것이 가능해지고, CPU의 부하를 매우 경감할 수 있어, 그만큼 다른 처리프로세스에 충당할 수 있으므로 CPU설계의 합리화를 도모할 수 있는 효과가 있다.As is apparent from the above embodiment, the audio signal processing apparatus according to the second embodiment of the present invention stores the peak signal output of the spectrum peak detector in the peak value memory or stores the peak signal output of the peak detector in accordance with the mode setting input. Compared to the peak memory, the configuration including a control unit for supplying the second control signal to the matching unit allows the matching operation to be performed only when the pitch frequency of the voice input is close to that registered in advance. In this case, the matching process is not performed, so that the process can be omitted and the rejection result can be obtained at high speed. In addition, it is possible to minimize the matching processing processor when the device is composed of microprocessors, etc., to reduce the CPU load very much, and to cover other processing processes so that the CPU design can be rationalized. It works.

이하, 본 발명의 제3실시예에 대해서 제6도를 참조하면서 설명한다.A third embodiment of the present invention will be described below with reference to FIG.

제6도는 본 발병의 제3실시예에 의한 음성신호처리장치의 블록도이다. 도면에 도시한 바와 같이 켑스트럼산출부(208)는 음성입력으로부터 켑스트럼을 산출해서 피크검출부(209)에 공급하고, 피크검출부(209)는 켑스트럼으로부터 피크를 검출해서 분석구간처리부(210)와 음성검출부(214)에 각각 공급한다. 음성검출부(214)는 피크검출부(209)로부터 공급되는 켑스트럼의 피크에 따라서 음성의 유무의 검출을 행하고, 매칭부(215)에 음성신호의 유무에 대응하는 제1제어신호를 공급한다. 분석구간처리부(210)는 피크검출부(209)로부터 공급되는 켑스트럼의 피크에 따른 최적분석구간을 설정해서 분석구간분류부(211)에 공급하는 동시에 모드설정입력에 따라서, 분석구간메모리(212)로부터 공급되는 상기 유사분석구간데이터나 미리 결정된 분석구간데이터를 피크검출부(209)에 공급한다. 분석구간분류부(211)는, 분석구간처리부(210)로부터 공급되는 상기 최적분석구간데이터와, 분석구간메모리(212)로부터 공급되는 분석구간데이터를 비교해서 분류처리를 행하고, 모드설정입력에 따라서 분석구간메모리(212)에 데이타를 기억하거나 분석구간메모리(212)를 판독해서 분석구간을 제어하는 동시에, 분류결과를 제2제어신호로 매칭부(215)에 공급한다. 또 음성분석부(213)는 음성입력을 매칭부(215)에서 사용하는 형식으로 분석해서 매칭부(215)에 공급한다. 매칭부(215)는, 음성분석부(213)에서 분석된 음성입력과, 음성검출부(214)의 제1제어신호 및 분석구간분류부(211)의 제2제어신호가 공곱되고, 이들 제어신호에 따라서 음성분석부(213)로부터 분석된 음성입력을 템플레이트와 조합해서 인식출력을 얻는 구성으로 되어 있다.6 is a block diagram of an audio signal processing apparatus according to a third embodiment of the present invention. As shown in the figure, the cepstrum calculation unit 208 calculates the cepstrum from the audio input and supplies it to the peak detector 209, and the peak detector 209 detects the peak from the cepstruum and analyzes the interval. 210 and a voice detector 214, respectively. The voice detector 214 detects the presence or absence of voice in accordance with the peak of the chop strum supplied from the peak detector 209, and supplies the matching unit 215 with a first control signal corresponding to the presence or absence of the voice signal. The analysis section processing section 210 sets the optimum analysis section according to the peak of the cepstrum supplied from the peak detection section 209 and supplies it to the analysis section classifying section 211 and at the same time, according to the mode setting input, the analysis section memory 212. The similar analysis section data or the predetermined analysis section data supplied from the above are supplied to the peak detection unit 209. The analysis section classification unit 211 compares the optimum analysis section data supplied from the analysis section processing section 210 with the analysis section data supplied from the analysis section memory 212 and performs a classification process. The data is stored in the analysis section memory 212 or the analysis section memory 212 is read to control the analysis section, and the classification result is supplied to the matching section 215 as a second control signal. The voice analyzer 213 analyzes the voice input in a format used by the matcher 215 and supplies the voice input to the matcher 215. The matching unit 215 multiplies the voice input analyzed by the voice analysis unit 213 with the first control signal of the voice detection unit 214 and the second control signal of the analysis section classification unit 211, and these control signals. According to this configuration, the voice input analyzed by the voice analyzer 213 is combined with a template to obtain a recognition output.

음성입력은 켑스트럼산출부(208) 및 피크검출부(209)를 통하여 켑스트럼의 피크가 검출되고, 켑스트럼피크는 음성검출부(214)에 공급되어 음성의 유무가 검출된다. 음성검출부(214)는 음성의 유무에 따라서 매칭부(215)에 제1제어신호를 공급한다. 여기서 피크검출부(209)는 분석구간처리부(210)로부터 공급되는 분석구간에 따라서 켑스트럼의 피크를 검출하도록 동작한다. 이때 피크검출부(209)에 공급되는 분석구간은 후술하는 바와 같이 모드설정입력에 대응한다. 또한, 음셩입력은 매칭부(215)에서 매칭처리를 행할 수 있도록, 음성분석부(213)에서 분석이 행해진다. 여기서 모드설정입력이 ＂등록＂모드일 경우와 ＂인식＂모드일 경우로 나누어 동작을 생각할 수 있다.As for the voice input, the peak of the chop strum is detected through the chop strum calculation unit 208 and the peak detector 209, and the chop strum peak is supplied to the voice detector 214 to detect the presence or absence of voice. The voice detector 214 supplies the first control signal to the matching unit 215 according to the presence or absence of voice. Here, the peak detector 209 operates to detect peaks of the cepstrum in accordance with the analysis section supplied from the analysis section processing unit 210. At this time, the analysis section supplied to the peak detector 209 corresponds to the mode setting input as described later. In addition, the voice input unit is analyzed by the voice analyzer 213 so that the matching unit 215 can perform a matching process. Here, the operation can be considered by dividing the mode setting input into the "registration" mode and the "recognition" mode.

먼저, 모드설정입력이 ＂등록＂모드일 경우, 분석구간처리부(210)는 피크검출부(209)에서의 피크검출의 분석구간을 미리 정해진 것으로 설정하고, 또 피크검출부(209)로부터 얻은 켑스트럼의 피크에 따라서 분석구간을 높은 정밀도로 산출해서 최적분석구간을 분석구간분류부(211)에 공급한다. 분석구간분류부(211)에서는 최적분석구간에 상기 유사한 분석구간이 분석구간메모리(212)에 존재하는지의 여부를 조사하여, 존재하지 않는 경우에는 최적분석구간을 새롭게 분석구간메모리(212)에 기억하고, 존재할 경우에는 분석구간메모리(212)의 상기 유사한 분석구간과 상기 최적분석구간을 상기와 같이 합성해서 분석구간메모리(212)의 내용과 치환해서 기억한다.First, when the mode setting input is the "registration" mode, the analysis section processing section 210 sets the analysis section of the peak detection in the peak detection section 209 to a predetermined value, and performs a spectrum obtained from the peak detection section 209. The analysis section is calculated with high precision according to the peak of, and the optimum analysis section is supplied to the analysis section classification unit 211. The analysis section classification unit 211 examines whether or not the similar analysis section exists in the analysis section memory 212 in the optimum analysis section, and if it does not exist, newly stores the optimal analysis section in the analysis section memory 212. If present, the similar analysis section and the optimal analysis section of the analysis section memory 212 are synthesized as described above and replaced with the contents of the analysis section memory 212 and stored.

다음에 모드설정입력이 ＂인식＂모드로 되었을 경우, 분석구간처리부(210)는 미리 부여된 분석구간의 데이터를 피크검출부(209)에 공급한다. 피크검출부(209)에서는 음성입력에 따른 켑스트럼의 피크가 검출되고, 피크에 대응해서 분석구간처리부(210)는 최적분석구간을 산출해서 분석구간분류부(211)에 공급한다. 분석구간분류부(211)는 주어진 최적분석구간에 상기 유사한 구간이 분석구간메모리(212)에 존재하는지의 여부를 조사하여, 존재할 경우에는 유사한 분석구간을 분석구간처리부(210)를 통하여 피크검출부(209)에 미리 결정한 상기 분석구간으로 치환해서 공급하고, 존재하지 않을 경우에는 미리 주어진 분석구간이 피크검출부(209)에 유지해서 공급된다. 또한 상기 유사한 분석구간의 존재유무를 표시하는 제2제어신호를 매칭부(215)에 공급한다. 그리고 매칭부(215)에서는 음성검출부(214)로부터 공급되는 제1제어신호 및 분석구간분류부(211)로부터 공급되는 제2제어신호에 따라서, 음성입력에 실제로 음성이 존재하고, 또한 음성입력의 켑스트럼의 피크분석구간이 상기한 바와 같이 미리 등록되어 있는 것과 유사할 경우에, 템플레이트와의 매치동작을 행하는 것이다.Next, when the mode setting input becomes the "recognition" mode, the analysis section processing unit 210 supplies the peak detection section 209 with data of the analysis section given in advance. The peak detection unit 209 detects the peak of the cepstruum in response to the voice input, and the analysis section processing section 210 calculates the optimum analysis section and supplies it to the analysis section classification section 211 in response to the peak. The analysis section classification unit 211 examines whether or not the similar section exists in the analysis section memory 212 in a given optimal analysis section, and if there is a similar analysis section through the analysis section processing unit 210, the peak detector ( The 209 is replaced with a predetermined analysis section, and when not present, a predetermined analysis section is supplied to the peak detection section 209. In addition, a second control signal indicating the presence or absence of the similar analysis section is supplied to the matching unit 215. In the matching unit 215, voice is actually present in the voice input according to the first control signal supplied from the voice detector 214 and the second control signal supplied from the analysis section classification unit 211. When the peak analysis section of the Cepstrum is similar to that previously registered as described above, the matching operation with the template is performed.

이와 같이 본 발명의 제3실시예의 음성신호처리장치에 의하면, 음성신호를 등록할 때 음성의 특징을 표시하는 피치주파수에 대응하는 켑스트럼피크가 있고, 켑스트럼피크에 대응한 분석구간을 분류처리에서 메모리에 기억함으로써, 등록한 복수의 음성입력중에서 유사한 것은 합성된 분석구간에 대응하고, 그밖의 음성입력은 단독의 분석구간에 대응해서 기억된다. 음성을 인식처리할 때는 임의의 음성입력의 켑스트럼피크에 대응한 분석구간을 메모리에 등록한 분석구간과 비교해서 등록완료인지의 여부를 판정할 수 있다. 또한, 분석구간을 설정함으로써 켑스트럼피크검출의 분석처리를 한정된 구간에서 행함으로써 처리를 신속하게 할 수 있고, 음성입력의 유무판정을 높은 표율로 할 수 있다. 또한 켑스트럼피크를 가지지 않는 잡음이 배제되어 오동작이 없어진다. 또한, 음성인식처리를 상기의 효율적인 음성입력의 확인과 등록와교의 확인후에 행하도록 하였으므로, 낭비없는 인식처리를 할 수 있어, 장치의 활용도 효율적으로 할 수 있다.As described above, according to the audio signal processing apparatus of the third embodiment of the present invention, when the audio signal is registered, there is a chop strum peak corresponding to the pitch frequency indicating the characteristics of the voice, and an analysis section corresponding to the chop strum peak By storing in the memory in the classification process, similarities among the plurality of registered voice inputs correspond to the synthesized analysis section, and other voice inputs are stored corresponding to the single analysis section. In the speech recognition process, it is possible to determine whether or not registration is completed by comparing an analysis section corresponding to a shock strum peak of an arbitrary voice input with an analysis section registered in the memory. In addition, by setting the analysis section, the analysis processing of the spectral peak detection is performed in a limited section, so that the processing can be performed quickly, and the presence or absence of audio input can be made high. In addition, noise that does not have a cepstrum peak is eliminated and malfunctions are eliminated. In addition, since the voice recognition process is performed after the above-mentioned effective voice input confirmation and registration and confirmation of registration, the wasteful recognition process can be performed, and the utilization of the device can be efficiently performed.

또 낭비없는 처리동작은 장치를 마이크로프로세서등으로 구성할 때, 소자의 처리부담이 경감되므로 많은 처리를 할 수 있게 되어 구성을 간략화할 수 있는 효과도 있다.In addition, the wasteless processing operation can reduce the processing burden of the device when the device is constituted by a microprocessor or the like, and thus, a large amount of processing can be performed, thereby simplifying the configuration.

이상의 실시예로부터 명백한 바와 같이, 본 발명의 제3실시예에 의한 음성신호처리장치는, 음성신호를 입력하여 음성분석수단의 분석출력을 사용해서 인식출력을 얻는 매칭부에 인식동작의 실행을 제어하는 제1제어신호의 입력 수단 및 제2제어신호의 입력수단을 구비하고, 음성신호의 켑스트럼을 산출해서 피크를지정된 분석구간에서 검출하는 피크검출수단과, 피크검출수단의 출력으로부터 음성신호에 유무에 대응하는 상기 제1제어신호를 출력하는 것으로 하고, 상기 음성입력에 대응한 최적분석구간을 산출하여 최적분석구간을 기본으로 한 분석구간을 분류해서 메모리에 기억 및 피크검출부에 공급하는 수단을 형성하고, 임의의 음성입력의 인식처리에 있어서, 음성입력에 대응한 분석구간과 상기 기억된 분석구간을 비교해서 제2제어신호를 출력하는 것으로 하고, 상기 제1 및 제2제어신호에 의한 제한은 음성신호가 있고, 음성신호가 인식대상일 때의 인식을 실행하는 음성신호처리장치로 함으로써, 인식처리에 낭비가 없고, 또 분석구간을 설정함으로써 켑스트럼피크검출의 분석처리가 신속하고 또한 켑스트럼피크를 가지지 않는 잡음이 배제되어 오동작이 없어진다. 또 낭비없는 인식처리를 할 수 있어 장치의 활용도 효율적으로 할 수 있다.As is apparent from the above embodiment, the speech signal processing apparatus according to the third embodiment of the present invention controls the execution of the recognition operation by the matching portion which inputs the speech signal and obtains the recognition output using the analysis output of the speech analysis means. A first detecting signal input means and a second controlling signal input means, the peak detecting means for calculating the spectral of the audio signal and detecting the peak in a designated analysis section, and the audio signal from the output of the peak detecting means. Means for outputting the first control signal corresponding to the presence or absence of a signal, calculating an optimum analysis section corresponding to the voice input, classifying the analysis section based on the optimum analysis section, and supplying the memory to the memory and peak detection unit. In the processing of recognizing any voice input, the second control signal is compared by comparing the analysis section corresponding to the voice input with the stored analysis section. The first and second control signals are limited by the first and second control signals, so that there is a voice signal and the voice signal processing device that performs the recognition when the voice signal is the recognition target is wasteful in the recognition process and is analyzed. By setting the interval, the analysis processing of the chop stratum peak detection is quick, and the noise that does not have the chop stratum peak is eliminated, and the malfunction is eliminated. In addition, since the recognition processing can be performed without waste, the utilization of the device can be efficiently performed.

또 낭비없는 처리동작은 장치의 소자의 처리부담이 경감되므로, 구성을 간략화할 수는 효과도 있다.In addition, the wasteful processing operation can reduce the processing burden on the elements of the apparatus, and therefore, the configuration can be simplified.

Claims

A spectral calculation section for inputting speech to calculate the chop strum, a peak detector for detecting peaks in the analysis section designated from the chop strum, a voice discriminator for obtaining a voice detection output from the peak detection output, and the peak An analysis section setting section for calculating an optimum analysis section from the detection output and designating an analysis section at the peak detection section, an analysis section memory for storing analysis sections classified and processed based on the optimum analysis section, and a mode setting input. In response, designating an analysis section designated by the analysis section setting section to the peak detection section, and instructing the analysis section setting section by combining the optimum analysis section with the analysis section memory in response to the mode setting input. An audio signal processing apparatus comprising a section classification unit.

A voice analyzer which analyzes a voice input and outputs an analysis signal, a chopstring calculation unit that calculates and outputs a chop stratum from the voice input, and a peak detector which detects the peak of the chop stratum and outputs a peak signal upon detection; And a voice detector for determining the presence or absence of voice from the peak signal and outputting a first control signal, a peak value memory for storing the peak signal, and a peak setting memory in response to the " set " mode of mode setting input. Recording and comparing the peak signal of the memory with the 켑 strum peak signal of the audio input in response to the ＂recognition mode of the mode setting input, and outputting the second control signal in response to the difference between the respective QuiFrenches. In response to input of the control unit and the first control signal and the second control signal. And a matching unit configured to perform a recognition output by comparing the analysis signal of the speech analysis unit with a template.

A voice analyzer for analyzing the voice input and outputting an analysis signal, a chopstring calculation unit for calculating and outputting the chop stratum from the voice signal, and a peak detector for detecting and outputting the peak of the chop strum in the designated section; From the output of the peak detector, an audio detector and an analysis section for outputting the first control signal corresponding to the presence or absence of a voice signal are designated and outputted to the peak detector, and an optimum analysis section corresponding to the chop strum peak is calculated and output. An analysis section processing section to store an analysis section processing section, an analysis section memory storing the analysis sections classified based on the optimum analysis section, and a mode setting input mode, and an analysis section designated by the analysis section processing section to the peak detection section; And in response to the mode setting input, combining the forensic analysis section with the analysis section data of the section memory. An analysis section classification section for outputting a second control signal corresponding to a recognition object of a speech signal and designating an analysis section in the analysis section data of the analysis section memory and the analysis section processing section, the first control signal and And a matching section for performing a recognition output by comparing the analysis signal of the speech analysis section with the template in response to the input of the second control signal.