KR20080061901A

KR20080061901A - Efficient Speech Recognition Method and System by Robot Input / Output Device

Info

Publication number: KR20080061901A
Application number: KR1020060137087A
Authority: KR
Inventors: 이희공; 정현철; 박성주; 신경철
Original assignee: 주식회사 유진로봇
Priority date: 2006-12-28
Filing date: 2006-12-28
Publication date: 2008-07-03

Abstract

A method and a system for effectively recognizing voice by an input/output device of a robot are provided to process a non-registered word effectively by filtering out a mute sound or noise, and improve the speed by saving resources and time related to processing of a mute sound. A contact sensor allows a user to perform manipulation. A microphone inputs a voice of a user. A voice recognition system(3) recognizes a user voice inputted via the microphone. A voice recognition system control module controls the voice recognition system according to an event occurring through manipulation of the contact sensor. An LED lamp(5) displays an operational state of the voice recognition system. A speaker outputs recognized voice information through the voice recognition system. A user guide providing unit allows a user to learn a vocalization method indirectly through a user feedback with respect to the voice input results. A microphone volume adjusting function(2) adjusts an input volume of the microphone when voice recognition is successful. A speaker volume adjusting function(7) adjusts the volume of the speaker by manipulating the contact sensor.

Description

Efficient Speech Recognition Method and System by Robot Input / Output Device {System and Method of Effcient Speech Recognition by Input / Output Device of Robot}

도 1은 본 발명에 따른 음성인식 시스템 전체 구성도1 is a overall configuration of the speech recognition system according to the present invention

도 2는 본 발명에 따른 음성인식 엔진 알고리즘 순서도2 is a flowchart of a speech recognition engine algorithm according to the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

터치센서(1), 음성입력장치(2), 음성인식 시스템(3), 음성인식 시스템 제어 모듈(4), LED램프(5), LCD 디스플레이부(6), 음성출력장치(7), 사용자 가이드 제공부(8), 끝점검출기(10), 특징추출기(11), 노이즈제거필터(12), 어휘인식수단(13), 후처리 수단(14), 어휘인식수단(13), 문법파일(Grammar File)(15), 문법(16), 음소모델DB(17), 음소모델사전(18)Touch sensor (1), voice input device (2), voice recognition system (3), voice recognition system control module (4), LED lamp (5), LCD display (6), voice output device (7), user Guide providing unit 8, end point detector 10, feature extractor 11, noise removing filter 12, lexical recognition means 13, post-processing means 14, lexical recognition means 13, grammar file ( Grammar File) (15), Grammar (16), Phoneme Model DB (17), Phoneme Model Dictionary (18)

본 발명은 로봇의 입출력 장치에 의한 효율적인 음성인식 방법 및 시스템에 관한 것으로서, 보다 상세하게는 로봇의 접촉센서에 의한 이벤트 방식으로 음성인식 엔진이 제어되며, 로봇의 귀에 LED가 형성되어 상기 LED의 점멸 상태를 통해 음성인식 엔진의 작동상태를 알 수 있으며, 음성인식 종료 후 성공 여부를 표시하도록 LCD 디스플레이부가 구비되어 있으며, 상기 터치센서에서 발생한 이벤트를 음성인식 엔진 제어 모듈에 전달하여 마이크로폰에 대한 자원을 독점하여 음성인식을 시도하고, 인식된 음성을 출력하는 로봇의 입출력 장치에 의한 효율적인 음성인식 방법 및 시스템에 관한 것이다.The present invention relates to an efficient voice recognition method and system by an input / output device of a robot, and more particularly, a voice recognition engine is controlled by an event method by a contact sensor of a robot, and an LED is formed on the ear of the robot to blink the LED. You can see the operating status of the voice recognition engine through the status, LCD display unit is provided to indicate the success or failure after the voice recognition, and transmits the event generated by the touch sensor to the voice recognition engine control module to provide resources for the microphone The present invention relates to an efficient speech recognition method and system by an input / output device of a robot that attempts to recognize speech exclusively and outputs recognized speech.

종래 음성인식 기능이 구현되어 있는 서비스 로봇은 사용자의 음성명령을 지각하여 특정 행동(Behavior)을 수행한다. 이때, 음성인식은 대개 현재의 기술수준에서 적용 가능한, '화자독립 고립단어인식'(정해진 특정단어들을 화자에 관계없이 인식)이나 '화자독립 핵심어 인식'(특정 단어들을 문장내에서 인식)정도를 채용하고 있는 실정이다.The service robot in which the conventional voice recognition function is implemented performs a specific behavior by recognizing a user's voice command. At this time, the speech recognition is generally applied to the degree of 'speaker independent isolated word recognition' (recognition of specific words determined regardless of the speaker) or 'speaker independent key word recognition' (recognition of specific words in a sentence) applicable at the current technology level. The situation is being adopted.

그리고, 이러한 음성인식의 방법들은 로봇 내부에 미리 정의된 단어들에 대해서 해당 음성 DB를 미리 저장한 후, 외부에서 특정한 단어의 음성입력이 들어오면 미리 저장되어 있는 로봇의 단어DB와 비교하여 적합한 결과를 출력한다.In addition, these methods of speech recognition pre-store the corresponding voice DB for the predefined words inside the robot, and then compare the result with the pre-stored word DB of the robot when a voice input of a specific word comes in from the outside. Outputs

AIBO의 여러 종류중에서 가장 많은 음성인식을 가증하게 한 모델은 Party AIBO 인데, 많은 종류의 음성을 인식하기 위해서 임의로 여러 모드로 나눈 후, 그 모드 설정을 위해 AIBO의 발바닥에 장착된 스위치를 임의로 정해진 방법에 따라, 눌러서 각각의 모드로 진입한다.Among the various types of AIBO, the model that amplifies the most voice recognition is Party AIBO.In order to recognize many kinds of voices, it is divided into several modes arbitrarily, and then the switch mounted on the sole of the AIBO's sole is set arbitrarily. Press to enter each mode.

상기 각각의 모드별 음성명령의 예를 들면, 이동과 관련된 모드에서는 전진, 후진, 우로, 좌로, 잘해, 스톱의 음성명령이 있고, 축구와 관련된 모드에서는 킥, 엎드 려, 골, 나이스 세입의 음성명령이 있으며, 게임과 관련된 모드에서는 가위, 바위, 보의 명령이 있는 등 각각의 모드별로 인식되는 음성명령이 다르다.Examples of the voice commands for each mode include voice commands of forward, backward, right, left, good and stop in modes related to movement, and voices of kick, down, goal and nice revenue in modes related to soccer. There are commands, and in the game-related modes, voice commands recognized by each mode are different, such as scissors, rocks, and beam commands.

상기와 같이, 모드별로 음성명령을 분리하는 이유는 한꺼번에 인식되는 음성명령의 갯수가 많으면 음성인식 기술의 한계로 음성인식의 정확성이 저하됨과 아울러 음성인식을 수행하는 시간이 길어지기 때문이다.As described above, the reason for separating the voice command for each mode is that if the number of voice commands recognized at one time is large, the accuracy of the voice recognition decreases due to the limitation of the voice recognition technology and the time for performing the voice recognition becomes long.

그러나, 상기와 같이 임의로 모드별로 분리하여 한 번에 인식하는 음성의 수를 줄이는 방법을 쓰지만 이는 모드를 설정하기 위해 임의로 정해진 스위치를 눌러야 하므로 직관적으로 이해하기 어렵고, 또한 현재의 모드를 확인하기 위해서 LCD 또는 LED를 부착해야 하므로 비용이 상승하는 문제점이 있다.However, as described above, a method of reducing the number of voices recognized at once by arbitrarily separating the modes is used. However, since it is necessary to press a predetermined switch to set the mode, it is difficult to understand intuitively and also to check the current mode. Or there is a problem that the cost increases because the LED must be attached.

이러한 문제점을 해결하기 음성인식 기술에 대한 연구가 지속적으로 이루어지고 있다. 이하, 음성인식 기술에 대한 선행 특허기술들을 살펴보기로 한다.In order to solve these problems, researches on voice recognition technology are continuously conducted. Hereinafter, a look at the prior patent technology for the voice recognition technology.

일본 공고번호 제1994-034181호는 부분적으로 유사하게 되어 있는 단어 사이에 있어서 잘못 인식될 확률과 표준 패턴 메모리의 기억용량이나 연산량이 적은 음성 인식방법에 관한 것으로서, 입력된 음성신호를 시간구분하는 수단과, 이 시간구분된 음성신호로부터 복수의 시계열 음향 파라미터를 얻는 수단과, 이 복수의 시계열 음향 파라미터로부터 시점간의 음향 파라미터 거리를 구하기 위한 수단과, 상기 음향 파라미터 거리를 구하기 위한 수단에 의해 얻어진 일련의 음향 파라미터 거리를 일정길이로 분할하고 상기 분할점의 음향 파라미터를 음성인식 파라미터에 의해 상기 복수의 시계열적 음향 파라미터로 구하기 위한 수단과, 상기 음성인식 파라미터를 구하기 위한 수단에 의해 얻어진 시간 정규화된 음성인식 파라미터와 이전부터 등록된 표준파라미터를 매칭 처리하는 수단과, 매칭 처리에 의해 결과를 출력하는 수단을 구비하는 것을 특징으로 하고 있으며,Japanese Publication No. 1994-034181 relates to a speech recognition method having a low memory capacity or a small amount of calculation of a probability of misrecognition among words which are partially similar, and a means for time-dividing an input speech signal. And a means for obtaining a plurality of time series acoustic parameters from the time-differentiated speech signal, a means for obtaining acoustic parameter distances between viewpoints from the plurality of time series acoustic parameters, and a series of means obtained for obtaining the acoustic parameter distances. Time normalized speech recognition obtained by means for dividing an acoustic parameter distance into a predetermined length and obtaining acoustic parameters of the splitting point as the plurality of time series acoustic parameters by speech recognition parameters, and means for obtaining the speech recognition parameters. Parameters and Previously Registered Standards Means for matching process for La and m, and is characterized in that means for outputting a result of the matching process,

국내 특허출원번호 제10-2004-0109128호는 영역별 언어모델과 대화모델을 이용한 지능형 로봇 음성 인식 서비스 장치 및 방법에 관한 것으로서, 사용자의 음성 질의에 대해 음성 인식을 수행하는 음성 인식부, 사용자의 음성 질의에 대응되는 대화 모델을 이용하여 응답 문장을 생성하는 대화 처리부, 상기 응답 문장을 음성으로 변환하여 사용자에게 출력하는 음성 합성부 및 대화 처리부로부터 응답 문장을 입력받아 사용자에게 디스플레이 시키는 디스플레이부로 구성됨을 특징으로 하고 있으며,Korean Patent Application No. 10-2004-0109128 relates to an intelligent robot voice recognition service apparatus and method using a language model and a dialogue model for each region, and includes a voice recognition unit for performing voice recognition on a user's voice query, A conversation processing unit for generating a response sentence using a conversation model corresponding to a voice query, a speech synthesis unit for converting the response sentence into a voice and outputting it to a user, and a display unit for receiving the response sentence from the conversation processing unit and displaying the response sentence to the user. It features

국내 특허출원번호 제10-2001-0005875호는 어린이들에게 로봇 장난감에 대한 애착과 보다 많은 흥미를 제공하는 음성인식 기능을 갖는 로봇 제어장치 및 방법에 관한 것으로서, 강아지 형상을 갖는 로봇 내에 설치되어 외부의 음성을 수신하여 인식하는 음성인식수단, 인식한 음성을 분석하여 로봇의 전체 동작을 제어하는 마이크로 프로세서 유니트, 마이크로 프로세서 유니트의 제어에 따라 펄스 폭변조 신호를 발생하는 펄스폭변조수단, 펄스폭 변조신호에 따라 로봇의 동작을 제어하는 모터구동수단, 로봇이 이동 동작을 수행할 때 추돌, 추락 등의 위험요소를 감지하여 예방하도록 하고 로봇이 좌,우측으로 넘어졌을 때 스스로 일어날 수 있도록 상기 감지한 신호들을 마이크로 프로세서 유니트에 제공하는 적외선센서군, 로봇의 동작에 대응하는 음성신호를 스피커로 출력하는 음성발생수단을 포함하며, 이에 따라 음성 명령으로 로봇의 동작을 제어하도록 구성되어 있고,Korean Patent Application No. 10-2001-0005875 relates to a robot control apparatus and method having a voice recognition function that provides children with more attachment and interest in robot toys. Voice recognition means for receiving and recognizing a voice of a person, a microprocessor unit for controlling the overall operation of the robot by analyzing the recognized voice, a pulse width modulation means for generating a pulse width modulation signal under the control of a microprocessor unit, and a pulse width modulation Motor driving means for controlling the operation of the robot according to the signal, to detect and prevent risk factors such as collision and fall when the robot performs the movement operation, and to detect when the robot falls to the left and right Infrared sensor group that provides signals to the microprocessor unit, voice corresponding to the operation of the robot Includes voice generation means for outputting a call to the speaker, and thus is configured to control the operation of the robot by voice command,

국내 특허출원번호 제10-2001-0059001는 음성인식 기능을 갖는 로봇 장난감의 방범 장치 및 방법에 관한 것으로서, 음성명령에 대한 코드값과 방범모드 데이터를 송출하는 송신장치, 로봇에 설치되며 송출된 데이터를 수신하는 제1 고주파수신수단; 로봇의 추돌, 추락과 좌, 우측으로의 넘어짐 등을 감지하는 적외선센서군, 로봇이 자유이동할 때 인체를 감지하는 인체감지수단, 제1 고주파수신수단에서 수신한 데이터와 적외선센서군 및 인체감지수단에서 얻어진 감지신호를 기초하여 로봇의 전체 동작을 제어하는 마이크로프로세서, 마이크로프로세서의 제어에 따라 로봇의 각 다리의 관절 및 목 관절에 설치된 다수의 모터를 구동시켜 로봇의 자유이동과 움직임 반응동작을 수행시키는 모터구동수단, 방범모드가 설정된 상태에서 인체가 감지되었을 때, 마이크로프로세서로부터 경보데이터를 입력받아 송출하는 고주파송신수단, 로봇과 근거리에 설치되어 송출된 경보신호를 수신하여 알람을 발생하고 미리 지정된 번호로 전화를 걸어 음성 메시지를 전송하는 경보장치를 포함하며, 이에 따라 음성명령으로 로봇의 동작을 제어하도록 구성되어 있으며, Korean Patent Application No. 10-2001-0059001 relates to a security device and method for a robot toy having a voice recognition function, and includes a transmission device for transmitting a code value and a security mode data for a voice command, and data transmitted and installed in a robot. First high frequency receiving means for receiving; Infrared sensor group for detecting the collision of the robot, fall and fall to the left, right, human body sensing means for detecting the human body when the robot moves freely, data received from the first high-frequency receiving means and infrared sensor group and human body sensing means A microprocessor that controls the overall operation of the robot based on the detection signal obtained from the robot, and drives a plurality of motors installed in the joints and neck joints of each leg of the robot under the control of the microprocessor to perform the free movement and movement response of the robot Motor driving means, high-frequency transmission means for receiving alarm data from the microprocessor and transmitting the alarm data when the human body is detected while the security mode is set. An alarm to send a voice message by dialing a number, It is configured to control the operation of the robot by sex command,

국내 특허출원번호 제10-2002-0013641호는 음성신호의 이득레벨을 세분화시켜 클리핑이 없고 신호잡음비가 큰 음성신호를 선택하여 음성인식을 수행함으로써, 음원의 거리에 구속받지 않고 음성인식율을 극대화할 수 있도록 한 지능형 로봇의 음성인식장치 및 방법에 관한 것으로서, 좌우측에 각각 소정 갯수로 이루어져, 음성신호를 입력받아 소정 레벨로 세분화된 다수의 고정이득에 의해 각기 증폭하는 마이크부와, 상기 마이크부에서 출력되는 다수의 증폭신호를 디지탈신호로 각기 변환하는 아나로그/디지탈변환부와, 상기 아날로그/디지탈변환부에서 출력되는 다수의 디지털신호를 각기 저장하는 버퍼부와. 상기 버퍼부에 저장된 다수의 디지탈신호를 입력받아 이를 음성신호 처리하는 음성처리부와, 상기 음성처리부에서 출력되는 신호에 의해 음성명령을 인식하는 음성인식부를 포함하여 구성되어 있고,Korean Patent Application No. 10-2002-0013641 subdivides the gain level of a voice signal to perform voice recognition by selecting a voice signal without clipping and having a high signal noise ratio, thereby maximizing the voice recognition rate without being constrained by the distance of the sound source. The present invention relates to a voice recognition device and a method for an intelligent robot, comprising: a microphone part configured to have a predetermined number on each of the left and right sides, and to amplify the voice signal by a plurality of fixed gains, each of which is divided into a predetermined level by receiving a voice signal; An analog / digital converter for converting a plurality of output amplified signals into digital signals, and a buffer unit for storing a plurality of digital signals respectively output from the analog / digital converter; It comprises a voice processing unit for receiving a plurality of digital signals stored in the buffer unit for processing the voice signal and a voice recognition unit for recognizing a voice command by the signal output from the voice processing unit,

국내 특허출원번호 제10-2001-0080831호는 로봇의 자세에 따라 음성인식 DB를 구별하여 구현함으로써 사용자의 명령을 효과적으로 인식할 수 있도록 하는 로봇의 음성인식방법에 관한 것으로서, 로봇의 자율모드에서 외부에서 음성명령이 저장되면 음성인식 모드로 전환하는 제1 과정과, 로봇의 현재 자세를 판단하여 그 자세에 맞는 단어DB를 검출하는 제2 과정과, 상기 검출된 단어DB의 단어 중에서 상기 음성명령과 일치되는 단어가 존재하면 현재 로봇의 상태를 참조하여 명령을 수행한 후 자율모드로 복귀하는 제3 과정으로 이루진다.Korean Patent Application No. 10-2001-0080831 relates to a voice recognition method of a robot that enables effective recognition of a user's command by distinguishing and implementing a voice recognition DB according to the attitude of the robot. When the voice command is stored in the first step of switching to the voice recognition mode, the second step of determining the current posture of the robot to detect the word DB according to the posture, and the voice command from the words of the detected word DB If a matched word exists, the third process returns to the autonomous mode after executing the command with reference to the current robot state.

상기한 바와 같이, 현재 음성인식 기술의 한계로 음성인식이 사전에 등록한 단어로 제한적이다. 또한 노이즈에 강인한 음성인식을 위해서는 인식과 계산에 필요한 자원의 효과적인 집중이 필요하다.As described above, voice recognition is limited to words registered in advance due to the limitation of current voice recognition technology. In addition, noise recognition robust speech recognition requires effective concentration of resources necessary for recognition and computation.

본 발명은 상기와 같은 문제점을 해결하기 위하여, 상대적으로 기술 진보가 느린 음성 인식 엔진 자체의 성능 향상을 꾀하기 보다는, 일반적으로 로봇에 장착되어 있는 여러 장치들인, 음성 엔진의 시작/종료를 알려주어 리소스나 오류 최소화를 위한 터치 센서와, 음성 발현의 시점을 알려주는 귀 LED, 음성 인식 결과에 따른 얼굴 표정과 음성 합성 장치, 그리고 외부 잡음이나 사용자 음성 크기에 따른 마이크/스피커 장치 등을 복합적으로 활용함으로써, 로봇의 입출력 장치에 의한 효 율적인 음성인식 방법 및 시스템을 제공하는데 그 목적이 있다.In order to solve the above problems, rather than attempting to improve the performance of the relatively slow speech recognition engine itself, the present invention generally provides information on the start / end of the speech engine, which is various devices mounted on a robot. Or a touch sensor for minimizing errors, an ear LED that informs the timing of speech expression, facial expression and speech synthesizer based on speech recognition results, and a microphone / speaker system according to external noise or user's voice size. In order to provide an efficient voice recognition method and system by the input / output device of a robot, the purpose is to provide an efficient method.

본 발명은 로봇의 접촉센서에 의한 이벤트 방식으로 음성인식 엔진이 제어되며, 로봇의 귀에 부착되어 있는 LED의 점멸 상태를 통해 음성인식 엔진의 작동상태를 알 수 있으며, 상기 터치센서에서 발생한 이벤트를 음성인식 엔진 제어 모듈에 전달하여 마이크로폰에 대한 자원을 독점하여 음성인식을 수행하고, 인식된 음성의 결과를 LCD 화면에 출력하는 것으로 이루어진 로봇의 입출력 장치에 의한 효율적인 음성인식 방법 및 시스템에 관한 것이다.According to the present invention, the voice recognition engine is controlled by an event method by a contact sensor of the robot, and the operation state of the voice recognition engine can be known through the blinking state of the LED attached to the robot's ear, and the event generated by the touch sensor The present invention relates to an efficient voice recognition method and system by an input / output device of a robot, which transmits to a recognition engine control module to perform a voice recognition by monopolizing resources for a microphone and outputs a result of the recognized voice on an LCD screen.

또한, 본 발명은 마이크 입력 볼륨을 조절하는 마이크 볼륨 조절기능(2)과, 스피커의 볼륨을 조절하도록 하는 스피커 볼륨 조절기능(7)을 추가하여,In addition, the present invention by adding a microphone volume control function (2) to adjust the microphone input volume, and a speaker volume control function (7) to adjust the volume of the speaker,

음성입력 성공시 입력된 음성이 조금 낮으면 마이크 볼륨을 조금 높이고, 음성볼륨이 조금 크다고 판단되면 마이크 볼륨을 조금 낮추어 상황에 따라 적응적인 음성 인식이 되도록 하였으며,Upon successful voice input, if the input voice is a little low, the microphone volume is slightly increased. If the voice volume is a little louder, the microphone volume is slightly lowered to allow adaptive voice recognition according to the situation.

음성 인식시 스피커에서 나오는 소리(음성포함)가 마이크로 재입력되는 현상을 최소화하여 외부 잡음을 줄이도록 하였다.When the voice is recognized, the sound from the speaker (including the voice) is minimized to re-input the microphone to reduce external noise.

본 발명에 따른 로봇의 입출력 장치에 의한 효율적인 음성인식 시스템은, 사용자가 조작 가능한 터치센서(1)와, 사용자의 음성입력을 위한 마이크(미도시)와, 상기 마이크를 통해 입력된 사용자 음성을 인식하는 음성인식 시스템(3)과, 터치센서(1)의 조작을 통해 발생된 이벤트에 따라 음성인식 시스템을 제어하는 음성인식 시스템 제어 모듈(4)과, 상기 음성인식 시스템(3)의 작동상태를 표시하는 LED램프(5)와, 상기 음성인식 시스템(3)을 통해 인식된 음성정보를 출력하는 스피커(미도시)와, 음석입력 결과에 대한 사용자 피드백으로 음성 발성법을 간접적으로 학습할 수 있도록 하는 사용자 가이드 제공부(8)로 구성되며,An efficient voice recognition system by an input / output device of a robot according to the present invention includes a touch sensor 1 operable by a user, a microphone (not shown) for user's voice input, and a user voice input through the microphone. The voice recognition system 3, the voice recognition system control module 4 for controlling the voice recognition system according to an event generated through the operation of the touch sensor 1, and the operating state of the voice recognition system 3 LED lamp 5 for displaying, a speaker (not shown) for outputting voice information recognized through the voice recognition system 3, and a user feedback on a voice input result to indirectly learn voice utterance. It consists of a user guide providing unit (8),

음성인식의 성공 시에 상기 마이크의 입력 볼륨을 조절하는 마이크 볼륨 조절기능(2)과, 상기 터치센서(1)의 조작에 의해 스피커의 볼륨을 조절하는 스피커 볼륨 조절기능(7)이 포함되어 구성된다.Microphone volume control function (2) for adjusting the input volume of the microphone upon successful voice recognition, and speaker volume control function (7) for adjusting the volume of the speaker by the touch sensor (1) is configured do.

필요에 따라, 음성 인식 종료 후 성공 여부를 표시할 수 있도록 LCD 등과 같은 음성인식결과 표시수단(6)을 포함하여 구성할 수도 있다.If necessary, it may be configured to include a voice recognition result display means 6, such as an LCD, so as to display the success or failure after the voice recognition ends.

음성인식 시스템(3)은 화자 독립 고립어 인식, 음성 DB 정합 방식, 문장 전체를 하나의 고립어로 처리 및 등록어, 비등록어로 구분함을 특징으로 하며,Speech recognition system (3) is characterized in that the speaker independent isolated word recognition, speech DB matching method, the entire sentence is treated as a single isolated word and divided into registered words, non-registered words,

보다 상세하게는 입력되는 신호로부터 음성구간만을 검출하는 끝점검출기(10)와, 상기 끝점검출기(10)를 통해 검출된 음성 신호로부터 인식에 유효한 특징 파라미터를 뽑아내는 특징추출기(11)와, 입력된 음성신호 중 인식 성능을 저하시키는 요인(숨소리, 입술소리 등)을 제거하는 노이즈제거필터(12)와, 입력된 음성신호가 인식어휘에 해당하는지를 체크하는 어휘인식수단(13)과, 인식과정에 의해 정렬된 정모델(일반적인 인식모델)의 유사도와 안티모델(필러모델)에 의한 유사도의 신뢰도(Confidence Measure)를 이용하여 인식된 결과를 받아들일 것인가 또는 거부할 것인가를 결정하는 후처리 수단(14)을 포함하여 구성된다.In more detail, an endpoint detector 10 for detecting only a speech section from an input signal, a feature extractor 11 for extracting a feature parameter effective for recognition from the speech signal detected through the endpoint detector 10, and an input; A noise removing filter 12 for removing a factor (breathing sound, lips sound, etc.) that degrades recognition performance among voice signals, lexical recognition means 13 for checking whether the input voice signal corresponds to the recognition vocabulary, and the recognition process. Post-processing means 14 for determining whether to accept or reject the recognized result using the confidence measure of the similarity of the regular model (general recognition model) aligned by the anti-model (pillar model). It is configured to include).

상기 LED램프(5)는 로봇의 귀에 장착되어 사용자의 터치센서 조작에 따라 ON/OFF 되며, 음성발성의 시작에 대한 정확한 시점을 제공하게 된다.The LED lamp (5) is mounted on the robot's ear is turned on / off according to the user's touch sensor operation, to provide an accurate time point for the start of speech.

상기 LCD 디스플레이부(6)는 음성인식에 대한 진행과정과 결과에 대한 표현을 담당한다. 예를 들어 음성인식을 시도 시의 얼굴 표정, 음성인식 성공 및 음성인식 실패 시의 표정 등이 출력된다.The LCD display unit 6 is in charge of the process and the result of the speech recognition. For example, a facial expression when a voice recognition is attempted, a facial expression when a voice recognition success and a voice recognition failure are output.

사용자 가이드는 결과에 대한 사용자 피드백으로 음성 발성법을 간접적으로 학습하도록 구성되어 있다. 예를 들어, 잘 모르겠습니다, 목소리가 너무 커요, 목소리가 너무 작아요, 등등.The user guide is configured to indirectly learn the speech method with user feedback on the results. For example, I'm not sure, the voice is too loud, the voice is too small, and so on.

본 발명에 따른 로봇의 입출력 장치에 의한 효율적인 음성인식 방법을 살펴보면, 사용자가 로봇의 터치센서의 조작을 통해 음성인식 이벤트를 발생시키는 단계와, 상기 음성인식 이벤트에 따라 제어되는 음성인식 엔진의 작동상태를 알 수 있도록 로봇의 귀에 형성된 LED램프를 점멸시키는 단계와, 상기 음성인식 이벤트를 음성인식 엔진 제어 모듈에 전달하여 마이크로폰에 대한 자원을 독점하여 음성인식을 시도하도록 하는 단계와, 사용자가 로봇의 마이크 부를 통해 음성명령을 입력시키는 음성입력단계와,Looking at the efficient voice recognition method by the input and output device of the robot according to the present invention, the user generating a voice recognition event through the operation of the touch sensor of the robot, the operating state of the voice recognition engine controlled according to the voice recognition event Flashing the LED lamp formed on the robot's ear so as to know the signal; and transmitting the voice recognition event to the voice recognition engine control module to attempt to recognize the voice by monopolizing the resources for the microphone; A voice input step of inputting a voice command through the unit;

상기 음성입력단계를 통해 음성이 입력되면, 입력되는 신호로부터 음성구간만을 검출(End Point Detection)하는 음성구간 검출 단계, 입력된 음성 신호로부터 인식에 유효한 특징 파라미터를 뽑아내는 특징추출(Feature Extraction) 단계, 입 력된 음성신호 중 인식 성능을 저하시키는 요인(숨소리, 입술소리 등)을 제거하는 노이즈제거 단계, 입력된 음성신호가 인식어휘에 해당하는지를 체크하는 어휘인식단계, 인식과정에 의해 정렬된 정모델(일반적인 인식모델)의 유사도와 안티모델(필러모델)에 의한 유사도의 신뢰도(Confidence Measure)를 이용하여 인식된 결과를 받아들일 것인가 또는 거부할 것인가를 결정하는 후처리 단계, 음성인식 성공 시 해당되는 단어를 출력하는 출력 단계를 포함하여 이루어짐을 특징으로 한다.When a voice is input through the voice input step, a voice section detection step of detecting only a voice section from the input signal, and a feature extraction step of extracting a feature parameter valid for recognition from the input voice signal. , Noise elimination step of eliminating factors (respiratory sounds, lip sounds, etc.) deteriorating recognition performance of input voice signal, lexical recognition step of checking whether input voice signal corresponds to recognition vocabulary, positive model sorted by recognition process The post-processing step of determining whether to accept or reject the recognized result using the confidence measure of the similarity of the general recognition model and the similarity by the anti-model. And an output step of outputting a word.

상기와 같은 본 발명은 음성인식의 시작과 종료를 터치센서에 의한 이벤트 방식으로 작동한다. 음성인식이 진행되는 동안에는 귀의 LED를 통하여 표시함으로써 사용자가 음성인식 엔진의 작동을 알 수 있게 한다. 또한 음성 인식 종료 후에는 LCD 창에 성공 여부를 표시한다. The present invention as described above operates the start and end of voice recognition in the event method by the touch sensor. While the voice recognition is in progress, it is displayed through the LED of the ear so that the user can know the operation of the voice recognition engine. In addition, after the voice recognition ends, the LCD window indicates success.

또한, 터치센서에서 발생한 이벤트를 음성인식 엔진 제어 모듈에 전달하여 마이크로폰에 대한 자원을 독점하여 음성인식을 시도하는 한편 음성합성 엔진 제어 모듈에 음성합성 중지 명령을 내린다. In addition, the event generated by the touch sensor is transmitted to the voice recognition engine control module, and the voice recognition is attempted by monopolizing the resources for the microphone, and the voice synthesis stop command is given to the voice synthesis engine control module.

이하, 첨부된 도면을 참조하여 본 발명을 보다 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described the present invention in more detail.

도 1은 본 발명에 따른 음성인식 시스템 전체 구성도, 도 2는 본 발명에 따른 음성인식 시스템 세부 구성도를 도시한 것이며, 터치센서(1), 음성입력장치(2), 음성인식 시스템(3), 음성인식 시스템 제어 모듈(4), LED램프(5), LCD 디스플레이부(6), 음성출력장치(7), 사용자 가이드 제공부(8), 끝점검출기(10), 특징추출 기(11), 노이즈제거필터(12), 어휘인식수단(13), 후처리 수단(14), 어휘인식수단(13), 문법파일(Grammar File)(15), 문법(16), 음소모델DB(17), 음소모델사전(18)를 나타낸다.1 is an overall configuration diagram of a voice recognition system according to the present invention, Figure 2 is a detailed configuration diagram of a voice recognition system according to the present invention, the touch sensor (1), voice input device (2), voice recognition system (3) ), Voice recognition system control module (4), LED lamp (5), LCD display (6), voice output device (7), user guide provider (8), endpoint detector (10), feature extractor (11) ), Noise reduction filter 12, lexical recognition means 13, post-processing means 14, lexical recognition means 13, Grammar File 15, Grammar 16, Phoneme Model DB 17 , Phoneme model dictionary 18.

본 발명에 따른 로봇의 입출력 장치에 의한 효율적인 음성인식 시스템은 도1에 도시된 바와 같이, 사용자에 의해 조작 가능한 터치센서(1)와, 사용자의 음성입력을 위한 마이크(미도시)와, 상기 마이크를 통해 입력된 사용자 음성을 인식하는 음성인식 시스템(3)과, 터치센서(1)의 조작을 통해 발생된 이벤트에 따라 음성인식 시스템을 제어하는 음성인식 시스템 제어 모듈(4)과, 상기 음성인식 시스템(3)의 작동상태를 표시하는 LED램프(5)와, 상기 음성인식 시스템(3)을 통해 인식된 음성정보를 출력하는 스피커(미도시)와, 음성입력 결과에 대한 사용자 피드백으로 음성 발성법을 간접적으로 학습할 수 있도록 하는 사용자 가이드 제공부(8)로 구성되며,As shown in FIG. 1, an efficient voice recognition system by an input / output device of a robot according to the present invention includes a touch sensor 1 operable by a user, a microphone (not shown) for user's voice input, and the microphone. A voice recognition system 3 for recognizing a user voice input through the voice recognition system, a voice recognition system control module 4 for controlling the voice recognition system according to an event generated through manipulation of the touch sensor 1, and the voice recognition LED lamp 5 indicating the operating state of the system 3, a speaker (not shown) for outputting voice information recognized through the voice recognition system 3, and a voice feedback method with user feedback on voice input results. Consists of a user guide providing unit (8) to indirectly learn,

음성 인식이 성공 시에 상기 마이크의 입력 볼륨을 조절하는 마이크 볼륨 조절기능(2)과, 상기 터치센서(1)의 조작에 의해 스피커의 볼륨을 조절하는 스피커 볼륨 조절기능(7)이 포함되어 구성됨을 특징으로 한다.Microphone volume control function (2) for adjusting the input volume of the microphone when the voice recognition is successful, and speaker volume control function (7) for adjusting the volume of the speaker by the touch sensor (1) is configured to include It is characterized by.

필요에 따라, 음성 인식 종료 후 성공 여부를 표시할 수 있도록 LCD 등과 같은 음성인식 결과 표시수단(6)을 포함하여 구성할 수도 있다.If necessary, it may be configured to include a voice recognition result display means 6, such as an LCD, so as to display the success or failure after the voice recognition.

상기 음성인식 시스템(3)은 화자 독립 고립어 인식, 음성 DB 정합 방식, 문장 전체를 하나의 고립어로 처리 및 등록어, 비등록어로 구분함을 특징으로 하며,The speech recognition system (3) is characterized in that the speaker independent isolated word recognition, speech DB matching method, the entire sentence is processed as one isolated word and divided into registered words, non-registered words,

보다 상세하게는 도2에 도시된 바와 같이, 입력되는 신호로부터 음성구간만을 검출하는 끝점검출기(10)와, 상기 끝점검출기(10)를 통해 검출된 음성 신호로부터 인식에 유효한 특징 파라미터를 뽑아내는 특징추출기(11)와, 입력된 음성신호 중 인식 성능을 저하시키는 요인(숨소리, 입술소리 등)을 제거하는 노이즈제거필터(12)와, 입력된 음성신호가 인식어휘에 해당하는지를 체크하는 어휘인식수단(13)과, 인식과정에 의해 정렬된 정모델(일반적인 인식모델)의 유사도와 안티모델(필러모델)에 의한 유사도의 신뢰도(Confidence Measure)를 이용하여 인식된 결과를 받아들일 것인가 또는 거부할 것인가를 결정하는 후처리 수단(14)을 포함하여 구성된다.In more detail, as shown in FIG. 2, an end point detector 10 for detecting only a voice section from an input signal and a feature parameter valid for recognition from a voice signal detected through the end point detector 10 are extracted. An extractor 11, a noise removing filter 12 that removes factors (such as breathing sounds, lip sounds, etc.) that reduce recognition performance among the input voice signals, and lexical recognition means for checking whether the input voice signal corresponds to the recognition vocabulary. (13) and accept or reject the recognized result using the confidence measure of the similarity of the regular model (general recognition model) aligned by the recognition process and the similarity by the anti-model (pillar model). It comprises a post-processing means 14 for determining the.

상기 어휘인식수단(13)은 문법파일(Grammar File)(15)로부터 문법(16)과, 음소모델DB(17)에 저장된 음소를 음소모델사전(18)을 통해 검색하여 어휘를 인식하게 된다.The lexical recognition means 13 searches the grammar 16 and the phonemes stored in the phoneme model DB 17 from the grammar file 15 to recognize the vocabulary.

상기 마이크 볼륨 조절기능(2)은 음성 인식이 성공 시에, 마이크 입력 볼륨을 조절하는 기능으로서, 성공 시 입력된 음성 볼륨이 조금 낮으면, 마이크 볼륨을 조금 높게 조정하는 반면에, 성공 시 입력된 음성볼륨이 조금 크다고 판단되면, 마이크 볼륨을 조금 낮게 조절하여 상황에 따라 적응하여 효과적인 음성 인식이 되도록 하는 기능이다.The microphone volume control function (2) is a function to adjust the microphone input volume when the voice recognition is successful, if the voice volume input when the success is a little low, while adjusting the microphone volume a little higher, If it is judged that the voice volume is a little louder, the microphone volume is adjusted a little lower so that it is adapted to the situation for effective voice recognition.

상기 스피커 볼륨 조절기능(7)은 사용자가 상기 접촉센서(1)를 조작하여 스피커의 볼륨을 조절하도록 하는 기능으로서, 음성인식 시 스피커에서 나오는 소리(음성포함)가 마이크로 재입력되는 현상을 최소화하여 외부잡음을 줄이고자 하는 기능 이다. 실제는 스피커의 볼륨을 0으로 한다.The speaker volume control function (7) is a function that allows the user to control the volume of the speaker by operating the touch sensor (1), by minimizing the phenomenon that the sound coming from the speaker (including voice) is re-input into the microphone during voice recognition It is a function to reduce external noise. In practice, the speaker volume is zero.

본 발명에 따른 로봇의 입출력 장치에 의한 효율적인 음성인식 방법을 살펴보면, 사용자가 로봇의 접촉센서의 조작을 통해 음성인식 이벤트를 발생시키는 단계와, 상기 음성인식 이벤트에 따라 제어되는 음성인식 엔진의 작동상태를 알 수 있도록 로봇의 귀에 형성된 LED램프를 점멸시키는 단계와, 상기 음성인식 이벤트를 음성인식 엔진 제어 모듈에 전달하여 마이크로폰에 대한 자원을 독점하여 음성인식을 시도하도록 하는 단계와, 사용자가 로봇의 마이크부를 통해 음성명령을 입력시키는 음성입력단계와,Looking at the efficient voice recognition method by the input and output device of the robot according to the present invention, the user generating a voice recognition event through the operation of the contact sensor of the robot, the operating state of the voice recognition engine controlled according to the voice recognition event Flashing the LED lamp formed on the robot's ear so as to know the signal; and transmitting the voice recognition event to the voice recognition engine control module to attempt to recognize the voice by monopolizing the resources for the microphone; A voice input step of inputting a voice command through the unit;

상기 음성입력단계를 통해 음성이 입력되면, 입력되는 신호로부터 음성구간만을 검출(End Point Detection)하는 음성구간 검출 단계와, 입력된 음성 신호로부터 인식에 유효한 특징 파라미터를 뽑아내는 특징추출(Feature Extraction) 단계와, 입력된 음성신호 중 인식 성능을 저하시키는 요인(숨소리, 입술소리 등)을 제거하는 노이즈제거 단계와, 입력된 음성신호가 인식어휘에 해당하는지를 체크하는 어휘인식단계와, 인식과정에 의해 정렬된 정모델(일반적인 인식모델)의 유사도와 안티모델(필러모델)에 의한 유사도의 신뢰도(Confidence Measure)를 이용하여 인식된 결과를 받아들일 것인가 또는 거부할 것인가를 결정하는 후처리 단계와, 음성 인식 성공시 해당되는 단어를 출력하는 출력단계를 포함하며,When a voice is input through the voice input step, a voice section detection step of detecting only a voice section from the input signal, and a feature extraction to extract a feature parameter valid for recognition from the input voice signal. Step, a noise removing step of removing a factor (breathing sound, lip sound, etc.) degrading recognition performance among input voice signals, a lexical recognition step of checking whether the input voice signal corresponds to a recognition vocabulary, and a recognition process. A post-processing step of determining whether to accept or reject the recognized result using the confidence measure of the similarity of the aligned regular model (general recognition model) and the similarity by the anti-model (pillar model), and voice An output step of outputting a corresponding word upon successful recognition,

필요에 따라 음성 인식 성공 여부를 로봇의 음성인식결과 표시수단에 표시하는 단계를 포함하여 동작하도록 할 수 있다.If necessary, the operation may be performed by displaying the success or failure of the voice recognition on the voice recognition result display unit of the robot.

음성구간 검출 단계는 입력되는 신호로부터 음성구간만을 검출하는 처리 과정으로써, 음성 인식에 사용하게 되는 신호를 전달하게 되는 가장 처음 과정이므로 매우 중요한 과정이며, 전체 인식 처리 속도에 영향을 주지 않기 위해서는 실시간에 가까운 처리 속도를 요구되는 알고리즘으로 구현되어야 한다. 입력 신호로부터 계산과정이 비교적 간단한 통계적 특징을 추출하여 음성구간과 묵음구간을 구분하는 방식을 사용한다. 일반적으로 사용되는 방법은 입력 신호의 매 구간에서 에너지값을 구하여 통계에 의해 미리 결정된 임계치와 비교하여 음성구간과 묵음구간을 판별하는 방법이다. 이 방법은 비교적 높은 신호 대 잡음비(SNR) 환경에서는 뛰어난 성능을 보이지만, 잡음이 많은 환경에서는 그 성능이 현저하게 떨어진다. 에너지값과 함께 영교차율(Zero Crossing Rate, ZCR)을 구하여 이를 특징으로 이용하는데, 이것은 주파수 특성을 고려하기 위함이다. 주변 잡음은 시간이 지남에 따라 그 특성이 변화하므로 이에 맞게 임계치를 적응적으로 조절해주는 과정이 반드시 요구된다. 신뢰성 있는 인식을 위해서는 비록 묵음구간을 음성구간으로 구분(False Acceptance)하는 경향이 많아지더라도 음성구간을 묵음구간으로 잘못 구분하는(False Rejection) 경우를 최소화하는 것이 중요하며, 이에 따라 오인식(false acceptance)일 경우에 사전에 없는 단어(Out Of Vocabulary)로 처리하거나 묵음 혹은 잡음으로 구분할 수 있는 인식 과정이 포함되어야 한다. The speech section detection step is a process of detecting only the speech section from the input signal, which is a very important process because it is the first process to deliver the signal used for speech recognition. It must be implemented with an algorithm that requires near processing speed. We extract statistical features that are relatively simple in the calculation process from the input signal and distinguish between speech and silence sections. A commonly used method is a method of determining the voice interval and the silent interval by obtaining an energy value in every section of the input signal and comparing it with a threshold predetermined by statistics. This method shows excellent performance in relatively high signal-to-noise ratio (SNR) environments, but degrades significantly in noisy environments. The Zero Crossing Rate (ZCR) along with the energy value is obtained and used as a feature. This is to consider the frequency characteristics. As ambient noise changes over time, it is essential to adjust the threshold accordingly. For reliable recognition, it is important to minimize false rejection of speech sections as silence sections, even if the silence section tends to be false acceptance. Therefore, false acceptance ), It should include a recognition process that can be treated as an Out Of Vocabulary or classified as either silence or noise.

상기 특징 추출단계는 입력된 음성 신호로부터 인식에 유효한 특징 파라미터를 뽑아내는 과정이다. 동일한 단어를 여러 사람이 발음하였을 경우 단어의 의미가 동 일하더라도 음성 파형은 동일하지 않으며, 동일한 사람이 동일한 단어를 동일한 시간에 연속으로 발음하였다고 하여도 음성 파형은 동일하지 않다. 이와 같은 현상의 이유는 음성파형에서는 음성의 의미 정보 이외에도 화자의 음색, 감정상태 등과 같은 정보도 포함하고 있기 때문이다. 그러므로 음성의 특징 추출이란 음성으로부터 의미 정보를 나타내어 주는 정보를 추출하는 과정이다. The feature extraction step is a process of extracting a feature parameter valid for recognition from an input voice signal. When the same word is pronounced by several people, even though the meaning of the word is the same, the voice waveforms are not the same. Even if the same person pronounces the same word continuously at the same time, the voice waveforms are not the same. The reason for this phenomenon is that the speech waveform includes not only the meaning information of the voice but also information such as the tone of the speaker and the emotional state. Therefore, feature extraction of speech is a process of extracting information representing semantic information from speech.

노이즈제거 단계는 입력된 음성신호 중 인식성능을 저하시키는 요인(숨소리, 입술소리 등)을 제거하는 과정이다. 일반적으로 인식성능을 저하시키는 요인으로는 인식대상 어휘가 아닌 OOV(Out-of-Vocabulary) 문제나 잡음 문제 등을 들 수 있다. 잡음으로는 주변 환경에 따른 배경 잡음과 숨소리, 입술소리 등의 발성잡음으로 나눌 수 있으며 배경 잡음은 특징 파라미터 추출 시 잡음 제거 기능에 의해 해결될 수 있으나 인간이 내는 잡음은 발성 처음이나 중간에 발생되기 때문에 human noise filler 모델을 적용하여 인식률 향상을 도모할 수 있다. The noise removing step is a process of removing a factor (breathing sound, lips sound, etc.) that degrades the recognition performance of the input voice signal. In general, factors that degrade recognition performance include an out-of-vocabulary (OVO) problem or a noise problem rather than a recognized vocabulary. Noise can be categorized into background noises based on the surrounding environment, and vocal noises such as breath sounds and lip sounds. Background noise can be solved by the noise canceling function when extracting feature parameters. Therefore, the recognition rate can be improved by applying the human noise filler model.

후처리 단계는 인식과정에 의해 정렬된 정모델(일반적인 인식모델)의 유사도와 안티모델(필러모델)에 의한 유사도의 신뢰도(Confidence Measure)를 이용하여 인식된 결과를 받아들일 것인가 또는 거부할 것인가를 결정하는 것이다. 신뢰도란 음성인식 결과에 대해서 그 결과가 얼마나 믿을만한 것인가를 나타내는 척도로, 인식된 결과인 음소나 단어에 대해서, 그 외의 음소나 단어로부터 그 말이 발화되었을 확률에 대한 상대값을 의미한다. 신뢰도 값이 정해진 어떤 임계값보다 클 경우에는 인식 결과를 인정하고, 반대로 작을 경우에는 거절하는 방법이다. 정모델은 일반적인 삼음(tri-phone)로 사용하며 안티모델은 해당 삼음의 문맥독립모델(Context Independent Model)로 사용하는 것이 일반적인 경우이지만 인식 계산량의 측면에서 정모델은 이음(di-phone)로 안티모델은 가우스혼합모델(Gaussian Mixture Model)을 사용하기도 한다. 이 후처리 단계는 조기종료 뿐만 아니라 사용자가 인식어휘에 없는 OOV(Out Of Vocabulary)를 발성할 때에 가장 유사한 단어를 출력하여 오인식되는 결과를 방지할 수 있다. The post-processing step determines whether to accept or reject the recognized results using the confidence measure of the similarity of the normal model (general recognition model) aligned by the recognition process and the similarity by the anti-model (pillar model). To decide. Reliability is a measure of how reliable the result is for a speech recognition result, and means a relative value of a phoneme or word that is a recognized result and the probability that the word is spoken from other phonemes or words. If the confidence value is larger than a predetermined threshold, the recognition result is accepted. In general, the regular model is used as a tri-phone and the anti-model is used as a context-independent model. The model also uses a Gaussian Mixture Model. This post-processing step may prevent early recognition as well as misrecognition by outputting the most similar words when the user speaks out of Vocabulary (OOV) that is not in the recognition vocabulary.

본 발명은 상기와 같은 문제점을 해결하기 위하여, 상대적으로 기술 진보가 느린 음성 인식 엔진 자체의 성능 향상을 꾀하기 보다는, 일반적으로 로봇에 장착되어 있는 여러 장치들인, 음성 엔진의 시작/종료를 알려주어 리소스나 오류 최소화를 위한 터치 센서와, 음성 발현의 시점을 알려주는 귀 LED, 음성 인식 결과에 따른 얼굴 표정과 음성 합성 장치, 그리고 외부 잡음이나 사용자 음성 크기에 따른 마이크/스피커 장치 등을 복합적으로 활용함으로써, 로봇의 입출력 장치에 의한 효율적인 음성인식을 가능하게 하였다. 또한, 음성인식의 성공 가능성을 높이기 위해 오인식의 분류임계치를 완화해야 하는 알고리즘의 특성을 고려하여 최대한 음성인식의 확률을 높이면서도 최종적인 오인식의 가능성을 낮추었으며, 묵음이나 잡음을 걸러냄으로써 미등록 단어를 효과적으로 처리할 수 있으며, 묵음처리에 관련된 자원과 시간을 절약함으로써 속도향상을 이룰 수 있었다. In order to solve the above problems, rather than attempting to improve the performance of the relatively slow speech recognition engine itself, the present invention generally provides information on the start / end of the speech engine, which is various devices mounted on a robot. Or a touch sensor for minimizing errors, an ear LED that informs the timing of speech expression, facial expression and speech synthesizer based on speech recognition results, and a microphone / speaker system according to external noise or user's voice size. In addition, the robot's input / output device enables efficient voice recognition. In addition, in order to increase the probability of success of speech recognition, considering the characteristics of the algorithm that should mitigate the classification threshold of misrecognition, the probability of speech recognition is reduced while increasing the probability of speech recognition as much as possible. It can be effectively processed and speed can be achieved by saving resources and time related to the silent process.

Claims

In an efficient voice recognition system by an input / output device of a robot, a touch sensor (1) operable by a user, a microphone (not shown) for user's voice input, and voice recognition to recognize a user's voice input through the microphone The system 3, a voice recognition system control module 4 for controlling the voice recognition system according to an event generated through the operation of the contact sensor 1, and an LED indicating an operating state of the voice recognition system 3; A user guide system for indirectly learning a speech method using a lamp 5, a speaker (not shown) for outputting voice information recognized through the voice recognition system 3, and user feedback on a voice input result. It consists of study (8),

Microphone volume control function (2) for adjusting the input volume of the microphone when the speech recognition is successful, and speaker volume control function (7) for adjusting the volume of the speaker by the touch sensor (1) is provided. Efficient voice recognition system by the input and output device of the robot, characterized in that.

The voice recognition system of claim 1, further comprising a voice recognition result display means (6) such as an LCD to display success after the voice recognition ends.

2. The speech recognition system (3) according to claim 1, wherein the speech recognition system (3) extracts a feature parameter effective for recognition from an end point detector (10) for detecting only a speech section from an input signal and a speech signal detected through the end point detector (10). A feature extractor 11, a noise removing filter 12 for removing a factor (respiratory sound, lip sound, etc.) deteriorating the recognition performance among the input voice signals, and a vocabulary for checking whether the input voice signal corresponds to the recognition vocabulary. Accept or reject the recognized result using the similarity between the recognition means 13 and the positive model (general recognition model) arranged by the recognition process and the confidence measure of the similarity by the anti-model (pillar model). Efficient speech recognition system by the input and output device of the robot, characterized in that it comprises a post-processing means for determining whether to do.

The method of claim 3, wherein the lexical recognition means (13) retrieves the grammar (16) from the grammar file (15) and the phonemes stored in the phoneme model DB (17) through a phoneme model dictionary (18) to recognize the vocabulary. Efficient voice recognition system by the input and output device of the robot, characterized in that.

In the efficient voice recognition method by the input and output device of the robot, the user generating a voice recognition event through the operation of the touch sensor of the robot, so that the user can know the operating state of the voice recognition engine controlled according to the voice recognition event Flashing the LED lamp formed on the robot's ear, transmitting the voice recognition event to the voice recognition engine control module, and attempting voice recognition by monopolizing the resources for the microphone; Voice input step of inputting,

When a voice is input through the voice input step, a voice section detection step of detecting only a voice section from the input signal, and a feature extraction to extract a feature parameter valid for recognition from the input voice signal. Step, a noise removing step of removing a factor (breathing sound, lip sound, etc.) degrading recognition performance among input voice signals, a lexical recognition step of checking whether the input voice signal corresponds to a recognition vocabulary, and a recognition process. A post-processing step of determining whether to accept or reject the recognized result using the confidence measure of the similarity of the aligned regular model (general recognition model) and the similarity by the anti-model (pillar model), and voice Efficiency by the input / output device of the robot, characterized in that it comprises an output step of outputting a corresponding word upon successful recognition The method of voice recognition.

The method according to claim 5, wherein the method comprises the step of displaying the success or failure of the voice recognition on the voice recognition result display means of the robot.

The method of claim 5, wherein the detecting of the speech segment uses a method of distinguishing between the speech segment and the silent segment by extracting statistical features having a relatively simple calculation process from the input signal. .

[6] The method of claim 5, wherein the detecting of the speech section further comprises a recognition process to process the silence section into a word that is not in the dictionary or to distinguish between silence or noise when the silence section is divided into a speech section. Efficient voice recognition method by robot input / output device.

The method of claim 5, wherein the noise removing step applies a human noise filler model.

The method according to claim 5, wherein the post-processing step uses the reliability measure of the similarity of the positive model aligned by the recognition process and the confidence measure of the similarity by the anti-model.