KR101590724B1

KR101590724B1 - Method for modifying error of speech recognition and apparatus for performing the method

Info

Publication number: KR101590724B1
Application number: KR1020140133992A
Authority: KR
Inventors: 이근배; 송재윤; 김용희; 구상준; 최준휘; 류성한
Original assignee: 포항공과대학교 산학협력단
Priority date: 2014-10-06
Filing date: 2014-10-06
Publication date: 2016-02-02
Anticipated expiration: 2034-10-06

Abstract

음성 인식 오류 수정 방법 및 장치가 개시된다. 음성 인식 오류 수정 방법은 사용자 발화에 대한 음성 인식을 수행하여 사용자 발화를 텍스트로 변환하는 단계, 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 이용하여 텍스트에 포함된 오류 단어를 검출하는 단계, 텍스트에서 오류 단어가 포함된 제1 단어 배열 패턴을 생성하고, 제1 단어 배열 패턴과 미리 구축된 단어 배열 모델로부터 추출된 제2 단어 배열 패턴을 매칭시켜 텍스트에 포함된 오류 단어를 수정하는 단계 및 미리 구축된 재귀 신경망 모델을 기반으로 수정된 텍스트에 포함된 오류 단어를 수정하는 단계를 포함한다. 따라서, 음성 인식의 성능을 향상시킴과 동시에 음성 인식을 활용하는 다양한 시스템에 적용이 용이하다.A method and apparatus for correcting speech recognition errors are disclosed. A method for correcting a speech recognition error includes the steps of converting a user utterance into a text by performing speech recognition on a user utterance, detecting an error word included in the text by using the part of speech information and the pronunciation information of each word constituting the text A step of generating a first word array pattern including an error word in the text and correcting an error word included in the text by matching the first word array pattern and the second word array pattern extracted from the pre- And modifying the error word included in the modified text based on the pre-built recursive neural network model. Therefore, it is easy to apply to various systems that improve speech recognition performance and utilize speech recognition.

Description

TECHNICAL FIELD [0001] The present invention relates to a method for correcting a speech recognition error,

본 발명은 음성 인식 기술에 관한 것으로, 더욱 상세하게는, 미리 구축된 단어 배열 모델과 재귀 신경망 모델을 이용하여 사용자 발화에 대한 음성 인식의 오류를 수정하는 방법 및 이를 수행하는 장치에 관한 것이다.The present invention relates to a speech recognition technology, and more particularly, to a method for correcting a speech recognition error in user speech using a pre-constructed word array model and a recursive neural network model, and an apparatus for performing the same.

최근 스마트폰, 태블릿 PC, PDA(Personal Digital Assistant) 등과 같은 사용자 단말의 이용이 대중화되고 정보 처리 기술이 발달함에 따라 사용자 단말에서 사용자의 음성을 인식하여 처리함으로써 사용자와 사용자 단말 간의 상호 작용을 제공하는 대화 시스템이 상용화되고 있다.2. Description of the Related Art In recent years, the use of user terminals such as a smart phone, a tablet PC, a PDA (Personal Digital Assistant) and the like has been popularized and information processing technology has been developed. Dialogue systems are being commercialized.

대화 시스템(Dialoge System)은 사용자 발화가 수신되면 말뭉치(corpus)를 이용하여 사용자 발화를 분석한 후 이에 적합한 응답(response)을 사용자에게 제공한다. 따라서, 대화 시스템은 사용자의 발화를 수신하여 이에 상응하는 텍스트로 변환하고, 변환된 텍스트를 통해 사용자의 발화 의도를 파악하는 음성 인식 기술이 필수적으로 요구된다. The Dialoge System analyzes the user utterance using a corpus when a user utterance is received, and provides a response to the user. Accordingly, the speech system is required to have a speech recognition technology that receives a user's speech and converts it into corresponding text, and grasps the user's speech intention through the converted text.

특히, 음성 인식 기술은 대화 시스템의 성능과 정확도를 결정하는 중요 기술임에 따라 사용자 발화에 대한 음성 인식의 오류를 수정하여 음성 인식의 성능을 향상시키기 위한 연구가 활발하게 진행되는 추세이다.In particular, since speech recognition technology is an important technology for determining the performance and accuracy of the conversation system, studies for improving the performance of speech recognition by correcting errors of speech recognition on user speech have been actively performed.

음성 인식의 성능을 향상시키기 위해서 일반적으로 대화 시스템의 사용 목적에 따라 음성 인식을 처리하는 알고리즘을 수정하는 방법이 이용되었다. 그러나, 이와 같은 종래의 기술은 경제성 및 효율성이 떨어진다는 문제가 있다.In order to improve the performance of speech recognition, a method of modifying the speech recognition processing algorithm is generally used according to the purpose of the conversation system. However, such a conventional technique has a problem that economical efficiency and efficiency are inferior.

그리하여, 사용자 발화가 텍스트로 변환될 때 발생하는 오류를 저장하는 오류 말뭉치와 사용자 발화에 상응하는 정답 텍스트를 저장하는 정답 말뭉치를 구축하고, 오류 말뭉치와 정답 말뭉치를 이용하여 사용자 발화에 대한 음성 인식의 오류를 수정하는 방법이 제안되었다. 그러나, 이와 같은 종래의 기술 또한 대화 시스템의 사용 목적에 따라 오류 말뭉치를 재구축해야 한다는 점에서 한계가 있다.Thus, a correct corpus storing an error corpus storing an error that occurs when a user utterance is converted into text and a correct text corresponding to a user utterance is constructed, and a speech corpus of user utterance is generated using an error corpus and a correct corpus A method of correcting errors has been proposed. However, such a conventional technique also has a limitation in that an error corpus must be reconstructed according to the purpose of use of the conversation system.

이를 해결하기 위해, 오류 말뭉치를 구축할 필요 없이 정답 말뭉치에 포함된 단어 배열의 매칭을 통해 사용자 발화에 대한 음성 인식의 오류를 수정하는 방법이 제안되었다. 그러나, 사용자 발화에 대한 음성 인식의 오류가 정답 말뭉치에 포함된 단어 배열과 매칭되지 않는 경우에는 음성 인식의 오류를 수정할 수 없다는 점에서 문제가 있다.In order to solve this problem, there is proposed a method of correcting a speech recognition error of a user utterance by matching a word array included in a correct corpus without needing to construct an error corpus. However, there is a problem in that the error of the speech recognition for the user utterance can not be corrected if the error of the speech recognition for the user utterance does not match the word arrangement included in the correct corpus.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 미리 구축된 단어 배열 모델과 재귀 신경망 모델을 이용하여 사용자 발화에 대한 음성 인식의 오류를 수정함으로써 음성 인식의 성능을 향상시킬 수 있는 음성 인식 오류 수정 방법을 제공하는 데 있다.SUMMARY OF THE INVENTION It is an object of the present invention to solve the above problems and to provide a speech recognition method and a speech recognition method capable of improving the performance of speech recognition by correcting errors of speech recognition on user speech using a pre- And to provide a correction method.

또한, 본 발명의 다른 목적은, 미리 구축된 단어 배열 모델과 재귀 신경망 모델을 이용하여 음성 인식의 성능을 향상시킴으로써 음성 인식을 활용하는 다양한 시스템에 적용이 용이하며 경제적이고 효율적으로 사용자 발화에 대한 음성 인식의 오류를 수정할 수 있는 음성 인식 오류 수정 장치를 제공하는 데 있다.It is another object of the present invention to improve the performance of speech recognition by using a pre-constructed word array model and a recursive neural network model, thereby being easy to apply to various systems utilizing speech recognition, And a speech recognition error correction device capable of correcting an error of recognition.

상기 목적을 달성하기 위한 본 발명의 일 측면에 따른 음성 인식 오류 수정 방법은, 디지털 신호 처리가 가능한 사용자 단말에서 수행되며 사용자 발화가 감지됨에 따라 사용자 발화에 대한 음성 인식을 수행하여 사용자 발화를 텍스트로 변환하는 단계, 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 이용하여 텍스트에 포함된 오류 단어를 검출하는 단계, 텍스트에서 오류 단어가 포함된 제1 단어 배열 패턴을 생성하고, 제1 단어 배열 패턴과 미리 구축된 단어 배열 모델로부터 추출된 제2 단어 배열 패턴을 매칭시켜 텍스트에 포함된 오류 단어를 수정하는 단계 및 오류 단어가 수정된 텍스트에서 오류 단어가 검출됨에 따라 미리 구축된 재귀 신경망 모델을 기반으로 수정된 텍스트에 포함된 오류 단어를 수정하는 단계를 포함한다.According to an aspect of the present invention, there is provided a method for correcting a speech recognition error, the method comprising: performing a speech recognition on a user's utterance in response to detecting a user utterance, Detecting an erroneous word included in the text by using the part-of-speech information and the pronunciation information of each word constituting the text, generating a first word array pattern including an erroneous word in the text, A step of correcting an error word included in the text by matching a second word array pattern extracted from the array pattern and a pre-constructed word array model, and a step of correcting the error word included in the text by using a recursive neural network model And correcting the error word included in the corrected text based on the error word.

여기에서, 텍스트에 포함된 오류 단어를 검출하는 단계는 말뭉치로부터 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 추출하는 단계, 사용자 단말을 운용하는 사용자로부터 사용자 발화에 상응하는 정답 텍스트가 입력됨에 따라 말뭉치(corpus)로부터 정답 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 추출하는 단계 및 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보와 정답 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 비교하여 텍스트에 포함된 오류 단어를 검출하는 단계를 포함할 수 있다.Here, the step of detecting an erroneous word included in the text may include extracting part of speech information and pronunciation information for each word constituting text from a corpus, inputting correct answer text corresponding to user utterance from a user operating the user terminal Extracting parts-of-speech information and pronunciation information for each word constituting the correct answer text from a corpus, and extracting part-of-speech information and pronunciation information for each word constituting the text, And comparing the information and pronunciation information to detect an erroneous word included in the text.

여기에서, 텍스트에 포함된 오류 단어를 수정하는 단계는 오류 단어를 기준으로 앞 또는 뒤에 위치하는 단어가 포함된 수정 후보를 추출하고, 텍스트에서 수정 후보가 포함된 제1 단어 배열 패턴을 생성하는 단계, 제1 단어 배열 패턴을 기반으로 미리 구축된 단어 배열 모델에서 적어도 하나의 제2 단어 배열 패턴을 추출할 수 있다.Here, the step of correcting the error word included in the text may include a step of extracting a correction candidate including a word located before or after the error word, and generating a first word array pattern including a correction candidate in the text , It is possible to extract at least one second word arrangement pattern in the word arrangement model constructed in advance based on the first word arrangement pattern.

여기에서, 텍스트에 포함된 오류 단어를 수정하는 단계는 제1 단어 배열 패턴에 상응하는 발음 정보와 적어도 하나의 제2 단어 배열 패턴 각각에 상응하는 발음 정보를 매칭시켜 유사도 점수를 산출하고, 유사도 점수가 높은 제2 단어 배열 패턴을 이용하여 텍스트에 포함된 오류 단어를 수정할 수 있다.Here, the step of correcting the erroneous word included in the text may include calculating the similarity score by matching the pronunciation information corresponding to the first word arrangement pattern and the pronunciation information corresponding to each of the at least one second word arrangement pattern, The error word included in the text can be corrected by using the second word arrangement pattern having a high degree of similarity.

여기에서, 미리 구축된 단어 배열 모델은 말뭉치(corpus)에 포함된 적어도 하나의 예제 텍스트로부터 적어도 하나의 단어 배열 패턴과 단어 배열 패턴 각각에 상응하는 발음 정보를 생성하여 기계 학습함으로써 구축될 수 있다.Here, the pre-constructed word array model can be constructed by generating and learning the pronunciation information corresponding to each of at least one word arrangement pattern and word arrangement pattern from at least one example text included in the corpus and machine learning.

여기에서, 미리 구축된 재귀 신경망 모델은 말뭉치에 포함된 적어도 하나의 단어에 대한 발음 정보 또는 문맥 정보를 음절 단위로 기계 학습하여 구축된 음절 단위 재귀 신경망 모델 및 말뭉치에 포함된 적어도 하나의 단어에 대한 발음 정보 또는 문맥 정보를 단어 단위로 기계 학습하여 구축된 단어 단위 재귀 신경망 모델을 포함할 수 있다.Here, the pre-constructed recursive neural network model includes a syllable unit recursive neural network model constructed by mechanically learning phonetic information or context information of at least one word included in a corpus in units of syllables and at least one word included in the corpus Word-based recursive neural network model constructed by mechanically learning pronunciation information or context information word by word.

여기에서, 수정된 텍스트에 포함된 오류 단어를 수정하는 단계는 오류 단어가 수정된 텍스트에서 오류 단어가 검출됨에 따라 상기 오류 단어의 발음 정보를 음절 단위로 산출하고, 음절 단위 재귀 신경망 모델에서 음절 단위로 산출된 오류 단어의 발음 정보를 기반으로 오류 단어가 수정될 수 있는 적어도 하나의 수정 단어 후보를 생성할 수 있다.Here, the step of correcting the erroneous word included in the corrected text may include calculating the pronunciation information of the erroneous word in units of syllables as the erroneous word is detected in the text in which the erroneous word is corrected, Based on the pronunciation information of the erroneous word calculated by the erroneous word, the erroneous word can be corrected.

여기에서, 수정된 텍스트에 포함된 오류 단어를 수정하는 단계는 적어도 하나의 수정 단어 후보 각각이 포함된 적어도 하나의 텍스트 후보를 생성하고, 단어 단위 재귀 신경망 모델을 기반으로 적어도 하나의 텍스트 후보에 대한 평가 점수를 산출하여 평가 점수가 높은 텍스트 후보에 포함된 수정 단어 후보를 이용하여 수정된 텍스트에 포함된 오류 단어를 수정할 수 있다.Wherein modifying the erroneous words included in the modified text comprises generating at least one text candidate containing each of the at least one corrected word candidate, and generating at least one text candidate based on the word-based recursive neural network model for at least one text candidate And the error word included in the corrected text can be corrected using the corrected word candidates included in the text candidate having the high evaluation score.

또한, 상기 목적을 달성하기 위한 본 발명의 다른 측면에 따른 음성 인식 오류 수정 장치는, 디지털 신호 처리가 가능한 사용자 단말에 구현되며 사용자 발화가 감지됨에 따라 사용자 발화에 대한 음성 인식을 수행하여 사용자 발화를 텍스트로 변환하는 음성 인식부, 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 이용하여 텍스트에 포함된 오류 단어를 검출하는 오류 검출부, 텍스트에서 오류 단어가 포함된 제1 단어 배열 패턴을 생성하고, 제1 단어 배열 패턴과 미리 구축된 단어 배열 모델로부터 추출된 제2 단어 배열 패턴을 매칭시켜 텍스트에 포함된 오류 단어를 수정하는 제1 오류 수정부 및 오류 단어가 수정된 텍스트에서 오류 단어가 검출됨에 따라 미리 구축된 재귀 신경망 모델을 기반으로 수정된 텍스트에 포함된 오류 단어를 수정하는 제2 오류 수정부를 포함한다.According to another aspect of the present invention, there is provided an apparatus for correcting a speech recognition error in a user terminal capable of digital signal processing, the apparatus comprising: a speech recognition unit configured to perform speech recognition on a user speech, An error detection unit for detecting an error word included in the text by using part-of-speech information and pronunciation information for each word constituting the text, a first word array pattern including an error word in the text, A first error correcting unit for correcting an error word included in the text by matching a first word arrangement pattern and a second word arrangement pattern extracted from a pre-constructed word array model, and a second error correcting unit for correcting an error word Based on the pre-built recursive neural network model as detected, the error word included in the modified text It includes a second error correction for modifying.

상술한 바와 같은 본 발명의 실시예에 따른 음성 인식 오류 수정 방법 및 장치에 따르면, 미리 구축된 단어 배열 모델과 재귀 신경망 모델을 이용하여 사용자 발화에 대한 음성 인식의 오류를 수정함으로써 음성 인식의 성능을 향상시킬 수 있다.According to the method and apparatus for correcting speech recognition error according to the embodiment of the present invention as described above, the performance of speech recognition can be improved by correcting errors in speech recognition for user utterance using a pre- Can be improved.

또한, 미리 구축된 단어 배열 모델과 재귀 신경망 모델을 이용하여 음성 인식의 성능을 향상시킴으로써 음성 인식을 활용하는 다양한 시스템에 적용이 용이하며 경제적이고 효율적으로 사용자 발화에 대한 음성 인식의 오류를 수정할 수 있다.In addition, by improving the performance of speech recognition using a pre-constructed word array model and a recursive neural network model, it is easy to apply to various systems utilizing speech recognition, and it is possible to economically and efficiently correct errors of speech recognition for user utterance .

도 1은 본 발명의 실시예에 따른 음성 인식 오류 수정 방법을 설명하는 흐름도이다.
도 2는 본 발명의 실시예에 따른 텍스트에 포함된 오류 단어를 검출하는 것을 설명하는 예시도이다.
도 3은 본 발명의 실시예에 따른 미리 구축된 단어 배열 모델을 이용하여 사용자 발화에 대한 음성 인식의 오류를 수정하는 것을 설명하는 예시도이다.
도 4는 본 발명의 실시예에 따른 미리 구축된 재귀 신경망 모델을 이용하여 사용자 발화에 대한 음성 인식의 오류를 수정하는 것을 설명하는 예시도이다.
도 5는 본 발명의 실시예에 따른 음성 인식 오류 수정 장치를 나타내는 블록도이다.1 is a flowchart illustrating a speech recognition error correction method according to an embodiment of the present invention.
2 is an exemplary diagram for explaining detection of an erroneous word included in text according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating an example of correcting errors in speech recognition for a user utterance using a pre-built word arrangement model according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating an example of correcting errors in speech recognition for a user utterance using a pre-built recursive neural network model according to an embodiment of the present invention.
5 is a block diagram illustrating a speech recognition error correction apparatus according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. The terms first, second, A, B, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.
Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 음성 인식 오류 수정 방법을 설명하는 흐름도이고, 도 2는 본 발명의 실시예에 따른 텍스트에 포함된 오류 단어를 검출하는 것을 설명하는 예시도이다.FIG. 1 is a flowchart illustrating a method of correcting a speech recognition error according to an embodiment of the present invention. FIG. 2 is a diagram illustrating an example of detecting an error word included in text according to an embodiment of the present invention.

또한, 도 3은 본 발명의 실시예에 따른 미리 구축된 단어 배열 모델을 이용하여 사용자 발화에 대한 음성 인식의 오류를 수정하는 것을 설명하는 예시도이고, 도 4는 본 발명의 실시예에 따른 미리 구축된 재귀 신경망 모델을 이용하여 사용자 발화에 대한 음성 인식의 오류를 수정하는 것을 설명하는 예시도이다.FIG. 3 is a diagram illustrating an example of correcting errors in speech recognition for user utterance using a pre-built word arrangement model according to an embodiment of the present invention. FIG. 2 is a diagram illustrating an example of correcting errors in speech recognition for user utterances using the constructed recursive neural network model.

도 1을 참조하면, 음성 인식 오류 수정 방법은 디지털 신호 처리가 가능한 사용자 단말에서 수행될 수 있다.Referring to FIG. 1, a speech recognition error correction method may be performed in a user terminal capable of digital signal processing.

여기에서, 사용자 단말은 스마트폰, 태블릿 PC, PDA(Personal Digital Assistant), 네비게이션(Navigation), 노트북, 컴퓨터, 스마트 가전 기기 및 시스템 로봇과 같은 정보 처리 장치를 의미할 수 있으나, 이에 한정되지 않고 사용자와의 대화가 필요한 다양한 기기로 확장될 수 있다.Herein, the user terminal may mean an information processing device such as a smart phone, a tablet PC, a PDA (Personal Digital Assistant), a navigation device, a notebook computer, a computer, a smart home appliance and a system robot, Can be extended to various devices requiring conversation with the user.

이 때, 사용자와 사용자 단말 간의 대화를 제공하기 위해서는 사용자의 발화를 수신하여 이에 상응하는 텍스트로 변환하고 변환된 텍스트를 통해 사용자의 발화 의도를 분석하는 음성 인식 기술이 필수적으로 구현되어야 한다. 특히, 음성 인식의 성능이 대화 시스템의 정확도를 결정할 수 있음에 따라 음성 인식의 성능을 향상시키기 위한 연구가 활발하게 진행되고 있다.In this case, in order to provide a dialog between the user and the user terminal, a speech recognition technology for receiving a user's utterance and converting it into corresponding text and analyzing the user's utterance intention through the converted text must be implemented. Particularly, since the performance of speech recognition can determine the accuracy of the conversation system, studies for improving the performance of speech recognition are actively being conducted.

종래에는 음성 인식의 성능을 향상시키기 위해서 음성 인식을 처리하는 알고리즘을 수정하는 방법 또는 미리 구축된 오류 말뭉치와 정답 말뭉치를 이용하여 사용자 발화에 대한 음성 인식의 오류를 수정하는 방법이 이용되었으나 이는 경제성 및 효율성이 떨어진다는 문제가 있다.Conventionally, in order to improve the performance of speech recognition, a method of correcting an algorithm for processing speech recognition or a method of correcting errors of speech recognition for user utterance by using a pre-established error corpus and a correct corpus has been used. However, There is a problem that efficiency is low.

상술한 종래 기술의 문제를 해결하기 위해 본 발명은 미리 구축된 단어 배열 모델 및 재귀 신경망 모델을 이용함으로써 경제적이고 효율적으로 사용자 발화에 대한 음성 인식의 오류를 수정하는 방법을 제안한다.In order to solve the problems of the prior art described above, the present invention proposes a method for economically and efficiently correcting errors in speech recognition for user utterance by using a pre-constructed word array model and a recursive neural network model.

본 발명에 따른 음성 인식 오류 수정 방법은 사용자 발화에 대한 음성 인식을 수행하여 사용자 발화를 텍스트로 변환하는 단계(S100), 텍스트에 포함된 오류 단어를 검출하는 단계(S200), 미리 구축된 단어 배열 모델을 이용하여 텍스트에 포함된 오류 단어를 수정하는 단계(S300) 및 미리 구축된 재귀 신경망 모델을 이용하여 수정된 텍스트에 포함된 오류 단어를 수정하는 단계(S400)를 포함할 수 있다.A method for correcting a speech recognition error according to the present invention includes the steps of converting a user utterance into a text by performing speech recognition on a user utterance (S100), detecting an error word included in the text (S200) (S300) of correcting the error word included in the text using the model (S300) and correcting the error word included in the corrected text using the pre-constructed recurrent neural network model (S400).

먼저, 사용자 발화가 감지됨에 따라 사용자 발화에 대한 음성 인식을 수행하여 사용자 발화를 텍스트로 변환할 수 있다(S100).First, as the user's utterance is detected, the user's utterance is converted into text by performing the voice recognition for the user utterance (S100).

보다 구체적으로, 사용자 발화에 대한 음성 인식은 사용자 단말이 사용자 발화를 이해하여 사용자의 의도에 상응하는 서비스를 제공할 수 있도록 사용자 발화에 따른 음성 신호를 사용자 단말이 다룰 수 있는 문자 정보로 변환하는 것을 의미한다.More specifically, speech recognition for user utterance is performed by converting a voice signal according to user utterance into character information that can be handled by the user terminal so that the user terminal can understand the user utterance and provide a service corresponding to the user's intention it means.

다만, 사용자 단말이 사용자 발화를 인식하는 환경 또는 사용자가 발화하는 문장을 구성하는 단어 각각의 발음에 따라 사용자 발화에 대한 음성 인식에 오류가 발생하여 사용자 발화와 상이한 텍스트로 변환될 수 있다.However, an error occurs in the speech recognition of the user utterance according to the pronunciation of each word constituting the environment in which the user terminal recognizes the utterance or the sentence uttered by the user, so that the utterance can be converted into a text different from the user utterance.

이와 같이 사용자 발화가 변환된 텍스트에 오류가 포함될 수 있으므로, 사용자 발화를 변환한 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 이용하여 오류 단어를 검출할 수 있다(S200).Since an error may be included in the text in which the user utterance is converted, an error word can be detected using the part of speech information and the pronunciation information of each word constituting the converted text of the user utterance (S200).

텍스트에 포함되는 오류 단어를 검출하기 위해서는 먼저, 말뭉치로부터 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 추출할 수 있다.In order to detect an erroneous word included in the text, part-of-speech information and pronunciation information for each word constituting the text can be extracted from the corpus.

여기에서, 말뭉치(corpus)란 언어 연구를 위해 컴퓨터가 읽을 수 있는 형태로 모아 놓은 언어 자료를 의미하며, 이를 위해 적어도 하나의 예제 텍스트가 포함될 수 있다. 말뭉치는 예제 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 미리 저장할 수 있다.Here, a corpus is a collection of language data in computer-readable form for language research, and at least one example text may be included for this purpose. The corpus can previously store part-of-speech information and pronunciation information for each word constituting the example text.

이 때, 품사 정보는 예제 텍스트를 구성하는 단어 각각을 기능, 형태 또는 의미에 따라 분류한 명사, 대명사, 동사, 형용사, 부사, 전치사, 접속사 및 감탄사 등의 정보를 의미할 수 있으며, 발음 정보는 예제 텍스트를 구성하는 단어 각각에 대하여 사용자가 소리내는 음성을 확인할 수 있도록 문자의 형태로 기록한 발음 부호를 의미할 수 있다.In this case, the part-of-speech information may mean information such as nouns, pronouns, verbs, adjectives, adverbs, prepositions, conjunctions, and exclamations that classify each word constituting the example text according to function, form or meaning, And may be a pronunciation code written in the form of a character so that the user can perceive the voice sound for each word constituting the example text.

텍스트에 포함되는 오류 단어를 검출하기 위해서는 사용자 발화에 대한 음성 인식을 수행하여 도 2 (a)에 도시된 바와 같이 단어 ‘W₁’, 'W₂’, ‘W₈’, ‘W₄’, ‘W₅’, ‘W₆’으로 구성된 텍스트로 변환될 수 있다. 그리하여, 말뭉치로부터 변환된 텍스트(21)를 구성하는 각각에 대한 품사 정보 ‘POS₁’, ‘POS₂’, ‘POS₈’, ‘POS₄’, ‘POS₅’, ‘POS₆’ 및 발음 정보 ‘Pron₁’, ‘Pron₂’, ‘Pron₃’, ‘Pron₄’, ‘Pron₅’, ‘Pron₆’를 추출할 수 있다.In order to detect an erroneous word included in the text, speech recognition for user utterance is performed to detect the words 'W ₁ ', 'W ₂ ', 'W ₈ ', 'W ₄ ''W ₅ ', and 'W ₆ '. POS ₁ ',' POS ₂ ',' POS ₈ ',' POS ₄ ',' POS ₅ ',' POS ₆ ', and pronunciation information 'Pron ₁ ', 'Pron ₂ ', 'Pron ₃ ', 'Pron ₄ ', 'Pron ₅ ', and 'Pron ₆ ' can be extracted.

또한, 도 2 (b)에 도시된 바와 같이 사용자 발화에 상응하는 정답 텍스트(23)가 단어 ‘W₁’, ‘W₂’, ‘W₃’, ‘W₄’, ‘W₅’, ‘W₆’로 구성되는 경우, 말뭉치로부터 정답 텍스트(23)를 구성하는 단어 각각에 대한 품사 정보 ‘POS₁’, ‘POS₂’, ‘POS₃’, ‘POS₄’, ‘POS₅’, ‘POS₆’ 및 발음 정보 ‘Pron₁’, ‘Pron₂’, ‘Pron₃’, ‘Pron₄’, ‘Pron₅’, ‘Pron₆’이 추출될 수 있다.As shown in FIG. 2 (b), the correct text 23 corresponding to the user's utterance includes the words 'W ₁ ', 'W ₂ ', 'W ₃ ', 'W ₄ ', 'W ₅ ' W ₆ 'when the configuration, the part of speech information for each word constituting the correct text 23 from the _{_{corpus' POS 1 ',' POS 2}} ',' POS 3 ',' POS 4 ',' POS 5 ',' Pron ₆ 'and pronunciation information' Pron ₁ ',' Pron ₂ ',' Pron ₃ ',' Pron ₄ ',' Pron ₅ ', and' Pron ₆ 'can be extracted.

그리하여, 변환된 텍스트(21)를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보와 정답 텍스트(23)를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 비교해보면, 변환된 텍스트(21)를 구성하는 단어 ‘W₈’과 정답 텍스트(23)를 구성하는 단어 ‘W₃’의 발음 정보는 ‘Pron₃’으로 동일하나 품사 정보가 상이한 것을 확인할 수 있다. 이를 기반으로, 변환된 텍스트(21)에서 ‘W₈’을 오류 단어(25)로 검출할 수 있다. Therefore, when the part-of-speech information and the pronunciation information for each of the words constituting the converted text 21 are compared with the parts-of-speech information and the pronunciation information for the words constituting the correct answer text 23, The pronunciations of the words 'W ₈ ' and 'W ₃ ' constituting the correct answer text 23 are the same as 'Pron ₃ ', but the parts of speech information is different. Based on this, 'W ₈ ' in the converted text 21 can be detected as the error word 25.

예를 들어, “For one hour, Continue to search”의 정답 텍스트가 사용자 발화에 대한 음성 인식 결과 “For one our, Continue to search”로 변환된 경우, 정답 텍스트의 ‘hour’와 변환된 텍스트의 ‘our’에 대한 발음 정보는

로 동일하지만 ‘hour’에 대한 품사 정보는 명사, ‘our’에 대한 품사 정보는 형용사로 상이할 수 있다. 이를 통해, 변환된 텍스트에서 ‘our’를 오류 단어로 검출할 수 있다.For example, if the correct text of "For one hour, Continue to search" is converted to "For one our, Continue to search" for the user utterance, pronunciation information for our '

But the part of speech information about 'hour' can be different from the part of speech information about 'our'. This allows us to detect 'our' as an error word in the translated text.

여기에서는, 발음 정보는 동일하나 품사 정보가 상이한 단어를 오류 단어 검출 예로 설명했으나 이에 한정되지 않고 품사 정보는 동일하나 발음 정보가 상이한 단어 또는 품사 정보와 발음 정보가 모두 상이하게 표현되는 단어를 오류 단어로 검출할 수 있다.Herein, the word whose pronunciation information is the same but the word of speech information is different is described as an example of detection of an erroneous word. However, the present invention is not limited to this case, but a word whose pronunciation information is the same but whose pronunciation information is different, Can be detected.

텍스트에서 오류 단어가 검출되면, 미리 구축된 단어 배열 모델을 이용하여 텍스트에 포함된 오류 단어를 수정할 수 있다(S300).If an error word is detected in the text, the error word included in the text can be corrected using the pre-constructed word array model (S300).

이 때, 미리 구축된 단어 배열 모델은 말뭉치에 포함된 적어도 하나의 예제 텍스트로부터 적어도 하나의 단어 배열 패턴과 단어 배열 패턴 각각에 상응하는 발음 정보를 생성하여 기계 학습함으로써 구축될 수 있다.At this time, the pre-built word array model can be constructed by generating and learning phonetic information corresponding to each of at least one word array pattern and word array pattern from at least one example text included in the corpus.

예를 들어, 말뭉치에 포함된 예제 텍스트 “This is an example sentence”는 예제 텍스트를 구성하는 단어를 이용하여 ‘This is an’, ‘is an example’, ‘an example sentence’, ‘This is an example’, ‘is an example sentence’ 및 ‘This is an example sentence’와 같은 단어 배열을 가지는 단어 배열 패턴으로 생성될 수 있다. 일반적으로, 단어 배열 패턴 각각은 예제 텍스트를 구성하는 적어도 3개 내지 5개의 연속된 단어가 포함되도록 생성될 수 있다.For example, the example text "This is an example sentence" included in the corpus can be found in the example text using the words "This is an", "is an example", "an example sentence" quot ;, " is an example sentence ", and " This is an example sentence ". Generally, each of the word array patterns can be generated to include at least three to five consecutive words constituting the example text.

이 때, 생성된 단어 배열 패턴에 단어 배열 패턴을 구성하는 단어 각각에 상응하는 발음 정보를 매핑하여 기계 학습함으로써 단어 배열 모델로 구축할 수 있다.At this time, pronunciation information corresponding to each word constituting the word array pattern is mapped to the generated word array pattern, and can be constructed as a word array model by machine learning.

텍스트에 포함된 오류 단어를 수정하기 위해서는, 텍스트로부터 오류 단어가 포함된 제1 단어 배열 패턴을 생성한 후, 제1 단어 배열 패턴과 미리 구축된 단어 배열 모델로부터 추출된 제2 단어 배열 패턴을 매칭시킴으로써 텍스트에 포함된 오류 단어를 수정할 수 있다.In order to correct an error word included in the text, a first word array pattern including an error word is generated from the text, and then a first word array pattern is matched with a second word array pattern extracted from a pre- You can correct the error word contained in the text.

예를 들어, ‘W₁’, 'W₂’, ‘W₈’, ‘W₄’, ‘W₅’, ‘W₆’으로 구성된 텍스트에서 오류 단어 ‘W₈’이 검출되면, 오류 단어 ‘W₈’를 기준으로 앞 또는 뒤에 위치하는 단어 ‘W₂’와 ‘W₄’를 포함하여 수정 후보 ‘W₂-W₈-W₄’를 추출하고, 수정 후보를 제외한 나머지 단어들은 고정(fix)시킬 수 있다. 이 때, 텍스트에서 수정 후보를 추출하는 이유는 오류 단어의 주변에 위치하는 단어 또한 오류일 가능성이 높기 때문이다. 그리하여, 도 3 (a)와 같이 텍스트에서 수정 후보가 포함된 제1 단어 배열 패턴인 ‘W₁-W₂-W₈-W₄’를 생성할 수 있다. For example, if the error word 'W ₈ ' is detected in text composed of 'W ₁ ', 'W ₂ ', 'W ₈ ', 'W ₄ ', 'W ₅ ', 'W ₆ ' W ₈ 'including the modified candidate "" based on the word which is located before or after the' W ₂ _{_{_{'and' W 4 W 2 -W 8 -W}}} 4 ' and the extracts, except the modified candidate words remaining are fixed (fix ). In this case, the reason for extracting the correction candidate from the text is that the word located in the vicinity of the error word is also likely to be an error. Thus, a first word arrangement pattern 'W ₁ -W ₂ -W ₈ -W ₄ ' including correction candidates in the text as shown in FIG. 3 (a) can be generated.

제1 단어 배열 패턴을 기반으로 미리 구축된 단어 배열 모델에서 적어도 하나의 제2 단어 배열 패턴을 추출할 수 있다. 예를 들어, 미리 구축된 단어 배열 모델에서 제1 단어 배열 패턴과 유사한 단어 배열 패턴 또는 제1 단어 배열 패턴 ‘W₁-W₂-W₈-W₄'에 상응하는 발음 정보 ‘Pron₁-Pron₂-Pron₃-Pron₄’와 일치하는 단어 배열 패턴을 검색하여 도 3 (b)와 같이 ‘W₁-W₂-W₃-W₄’의 단어 배열을 가지는 제2 단어 배열 패턴 A와 ‘W₁-W₂-W₉-W₄’의 단어 배열을 가지는 제2 단어 배열 패턴 B를 추출할 수 있다.It is possible to extract at least one second word arrangement pattern in a word array model constructed in advance based on the first word arrangement pattern. For example, the word arrangement pattern similar to the first word arrangement pattern or the pronunciation information 'Pron ₁ -Pron' corresponding to the first word arrangement pattern 'W ₁ -W ₂ -W ₈ -W ₄ ' in the pre- ₂ -Pron ₃ -Pron ₄ ', the second word arrangement pattern A having the word arrangement of' W ₁ -W ₂ -W ₃ -W ₄ 'as shown in FIG. 3 (b) The second word arrangement pattern B having the word arrangement of W ₁ -W ₂ -W ₉ -W ₄ 'can be extracted.

그리하여, 제1 단어 배열 패턴에 상응하는 발음 정보 ‘Pron₁-Pron₂-Pron₃-Pron₄’를 제2 단어 배열 패턴 A에 상응하는 발음 정보 ‘Pron₁-Pron₂-Pron₃-Pron₄’ 및 제2 단어 배열 패턴 B에 상응하는 발음 정보 ‘Pron₁-Pron₂-Pron₉-Pron₄’각각과 매칭시켜 유사도 점수를 산출할 수 있다.Thus, a first pronunciation information corresponding to the word wiring pattern _{_{_{'Pron 1 -Pron 2 -Pron 3 -Pron}}} 4' a second information corresponding to the word pronunciation arrangement pattern A _{_{_{'Pron 1 -Pron 2 -Pron 3 -Pron}}} 4' Pron ₁ -Pron ₂ -Pron ₉ -Pron ₄ 'corresponding to the first word arrangement pattern B and the second word arrangement pattern B, respectively.

여기에서, 제1 단어 배열 패턴 ‘W₁-W₂-W₈-W₄’와 제2 단어 배열 패턴 A ‘W₁-W₂-W₃-W₄’에서 단어 배열 패턴을 구성하는 단어 ‘W₈’과 ‘W₃’은 각각 상이하나 이에 따른 발음 정보 ‘Pron₃’으로 일치함에 따라 단어 배열 패턴에 상응하는 발음 정보가 동일하게 표현될 수 있다.Here, the first word array patterns _{_{_{'W 1 -W 2 -W 8 -W}}} 4' and the second word array pattern _{_{_{A 'W 1 -W 2 -W 3}}} -W 4' words that make up the words in an array pattern, W ₈ 'and' W ₃ 'are different from each other but corresponding pronunciation information' Pron ₃ ', pronunciation information corresponding to the word arrangement pattern can be expressed equally.

이 때, 유사도 점수는 레벤시타인 거리(Levenshtein distance)를 이용하여 산출할 수 있다. 레벤시타인 거리는 하나의 문자열을 다른 문자열로 바꿀 때 몇 번의 변경이 필요한지를 측정하는 방식으로 오류 단어가 올바른 단어로 수정될 때 필요한 변경 단계를 기반으로 산출될 수 있다.In this case, the similarity score can be calculated using the Levenshtein distance. The Lebensitian distance can be computed based on the change step needed when the error word is corrected to the correct word by measuring how many changes are needed when changing one string to another.

즉, 도 3 (c)에 도시된 바와 같이 제1 단어 배열 패턴과 제2 단어 배열 패턴 A의 유사도 점수는 80점으로 산출되고, 제1 단어 배열 패턴과 제2 단어 배열 패턴 B의 유사도 점수가 56점으로 산출될 수 있다. 3 (c), the similarity score between the first word arrangement pattern and the second word arrangement pattern A is calculated to be 80 points, and the similarity score between the first word arrangement pattern and the second word arrangement pattern B is 56 points can be calculated.

그리하여, 유사도 점수가 높은 제2 단어 배열 패턴 A를 이용하여 텍스트에 포함된 오류 단어 ‘W₈’를 ‘W₃’으로 수정할 수 있다.Thus, the error word 'W ₈ ' included in the text can be corrected to 'W ₃ ' by using the second word arrangement pattern A having a high degree of similarity score.

다만, 상술한 바와 같이 미리 구축된 단어 배열 모델을 이용하여 텍스트에 포함된 오류를 수정하는 방법은 미리 구축된 단어 배열 모델에 텍스트에서 생성된 제1 단어 배열 패턴과 매칭되는 단어 배열 패턴이 존재하지 않는 경우 텍스트에 포함된 오류 단어를 수정할 수 없다. 또한, 미리 구축된 단어 배열 모델을 통해 텍스트에 포함된 오류 단어가 사용자 발화와 상이하게 수정될 수도 있다.However, as described above, a method of correcting an error included in a text by using a word array model constructed in advance is that a word array pattern matching with a first word array pattern generated in a text does not exist in a pre- If it does not, you can not correct the error word contained in the text. In addition, the error word included in the text may be modified differently from the user utterance through the pre-constructed word array model.

따라서, 오류 단어가 수정된 텍스트에서 오류 단어가 검출되는 지의 여부를 확인한 후, 수정된 텍스트에서 오류 단어가 검출되면 미리 구축된 재귀 신경망 모델을 기반으로 수정된 텍스트에 포함된 오류 단어를 재차 수정할 수 있다(S400).Therefore, after checking whether the error word is detected in the corrected text, if the error word is detected in the corrected text, the error word included in the corrected text can be corrected again based on the pre-built recurrent neural network model (S400).

여기에서, 미리 구축된 재귀 신경망 모델은 음절 단위 재귀 신경망 모델 및 단어 단위 재귀 신경망 모델을 포함할 수 있다.Here, the pre-built recursive neural network model may include a syllable-based recursive neural network model and a word-based recursive neural network model.

음절 단위 재귀 신경망 모델은 말뭉치에 포함된 적어도 하나의 단어에 대한 발음 정보 또는 문맥 정보를 역전파 알고리즘(Back Propagation Algorithm)을 이용하여 음절 단위로 기계 학습하여 구축될 수 있다.The syllable unit recursive neural network model can be constructed by mechanically learning phonetic information or context information for at least one word contained in a corpus by syllable units using Back Propagation Algorithm.

또한, 단어 단위 재귀 신경망 모델은 말뭉치에 포함된 적어도 하나의 단어에 대한 발음 정보 또는 문맥 정보를 역전파 알고리즘(Back Propagation Algorithm)을 이용하여 단어 단위로 기계 학습하여 구축될 수 있다.In addition, the word-based recursive neural network model can be constructed by mechanically learning phonetic information or context information for at least one word contained in a corpus by using a back propagation algorithm.

수정된 텍스트에 포함된 오류 단어를 재차 수정하기 위해서는 먼저, 오류 단어의 발음 정보를 음절 단위로 산출하고, 음절 단위로 산출된 오류 단어의 발음 정보를 기반으로 음절 단위 재귀 신경망 모델을 이용하여 오류 단어가 수정될 수 있는 적어도 하나의 수정 단어 후보를 생성할 수 있다.In order to correct the error words included in the modified text, first, the pronunciation information of the error word is calculated in units of syllables, and based on the pronunciation information of the error word calculated in syllable units, Lt; RTI ID = 0.0 > word < / RTI >

음절 단위 재귀 신경망 모델을 이용하여 적어도 하나의 수정 단어 후보가 생성되면 수정 단어 후보 각각을 오류 단어와 교체하여 적어도 하나의 텍스트 후보를 생성한 후, 단어 단위 재귀 신경망 모델을 기반으로 적어도 하나의 텍스트 후보에 대한 평가 점수를 산출하여 평가 점수가 높은 텍스트 후보에 포함된 수정 단어 후보를 이용하여 수정된 텍스트에 포함된 오류 단어를 수정할 수 있다.When at least one modified word candidate is generated using a syllable unit recursive neural network model, at least one text candidate is generated by replacing each corrected word candidate with an error word, and then at least one text candidate And the error word included in the corrected text can be corrected using the corrected word candidate included in the text candidate having the high evaluation score.

예를 들어, 도 4a를 참조하면, 수정된 텍스트에서 오류 단어가 N번째 음절부터 시작하여 3음절로 구성된다고 가정할 때, 수정된 텍스트에서 오류 단어 앞에 위치하는 1번째 음절부터 N-1번째 음절까지의 문맥 정보를 추출하고, 추출된 문맥 정보와 오류 단어가 시작되는 N번째 음절을 이용하여 1번째 음절부터 N번째 음절까지의 문맥 정보를 추출할 수 있다. 여기에서, 문맥 정보를 추출하기 위해서는 시그모이드 함수(Sigmoid Function)이 이용될 수 있으나 이에 한정되는 것은 아니다.For example, referring to FIG. 4A, assuming that the error word in the modified text is composed of three syllables starting from the N-th syllable, in the modified text, from the first syllable positioned before the error word to the N- The context information from the first syllable to the Nth syllable can be extracted using the extracted context information and the Nth syllable at which the error word starts. Here, the sigmoid function may be used to extract the context information, but the present invention is not limited thereto.

그리하여, 1번째 음절부터 N번째 음절까지의 문맥 정보와 오류 단어의 발음 정보를 기반으로 N+1번째 음절을 예측할 수 있다. 이 때, 다음 음절을 예측하기 위해서는 소프트맥스 함수(Softmax Function)이 이용될 수 있다.Thus, the (N + 1) -th syllable can be predicted based on the context information from the first syllable to the N-th syllable and pronunciation information of the erroneous word. At this time, a Softmax Function can be used to predict the next syllable.

이와 같은 방법으로 오류 단어가 수정될 수 있는 음절을 예측하여 적어도 하나의 수정 단어 후보를 생성할 수 있다. 이 때, 오류 단어가 3음절이므로 2음절 내지 4음절을 예측하여 적어도 하나의 수정 단어 후보를 생성할 수 있다. 다만, 오류 단어의 음절 수가 예측할 단어의 음절 수보다 적은 경우에는 오류 단어의 발음 정보를 이용하여 오류 단어가 수정될 수 있는 수정 단어 후보를 생성할 수 있으나 이에 한정되는 것은 아니다.In this manner, at least one corrected word candidate can be generated by predicting a syllable whose error word can be corrected. At this time, since the error word is three syllables, at least one corrected word candidate can be generated by predicting two syllables or four syllables. However, if the number of syllables of the error word is smaller than the number of syllables of the word to be predicted, the corrected word candidate can be generated using the pronunciation information of the error word, but the present invention is not limited thereto.

적어도 하나의 수정 단어 후보가 생성되면 수정 단어 후보 각각을 오류 단어와 교체하여 적어도 하나의 텍스트 후보를 생성하고, 도 4b에 도시된 바와 같이 텍스트 후보를 미리 구축된 단어 단위 재귀 신경망 모델을 기반으로 평가할 수 있다. When at least one corrected word candidate is generated, at least one text candidate is generated by replacing each of the corrected word candidates with an error word, and the text candidate is evaluated based on the constructed word-based recursive neural network model as shown in FIG. 4B .

보다 구체적으로, 수정 단어 후보와 수정 단어 후보 앞에 위치하는 단어까지의 문맥 정보를 이용하여 수정 단어 후보가 포함된 문맥 정보를 추출하여 수정 단어 후보가 오류 수정에 타당한지 평가 점수를 산출할 수 있다. More specifically, the context information including the corrected word candidates may be extracted using the context information from the corrected word candidates and the words positioned before the corrected word candidates, so that the corrected word candidates can be calculated to be appropriate for the error correction.

이와 같은 방식으로 적어도 하나의 텍스트 후보 각각에 대한 평가 점수를 산출하여 평가 점수가 높은 텍스트 후보에 포함된 수정 단어 후보로 오류 단어를 수정함으로써 사용자 발화에 대한 음성 인식의 오류 수정을 할 수 있다.
In this way, an evaluation score for each of at least one text candidates can be calculated, and an error word can be corrected for a user utterance by correcting an erroneous word with a corrected word candidate included in a text candidate having a high evaluation score.

도 5는 본 발명의 실시예에 따른 음성 인식 오류 수정 장치를 나타내는 블록도이다.5 is a block diagram illustrating a speech recognition error correction apparatus according to an embodiment of the present invention.

도 5를 참조하면, 음성 인식 오류 수정 장치(100)는 음성 인식 오류 수정 방법은 디지털 신호 처리가 가능한 사용자 단말에 구현될 수 있다.Referring to FIG. 5, the speech recognition error correction apparatus 100 may be implemented in a user terminal capable of digital signal processing.

본 발명에 따른 음성 인식 오류 수정 장치(100)는 음성 인식부(110), 오류 검출부(120), 제1 오류 수정부(130) 및 제2 오류 수정부(140)를 포함할 수 있고, 단어 배열 DB(150) 및 재귀 신경망 DB(160)를 더 포함할 수 있다.The speech recognition error correction apparatus 100 according to the present invention may include a speech recognition unit 110, an error detection unit 120, a first error correction unit 130 and a second error correction unit 140, An array DB 150 and a recursive neural network DB 160.

음성 인식부(110)는 사용자 발화가 감지됨에 따라 사용자 발화에 대한 음성 인식을 수행하여 사용자 발화를 텍스트로 변환할 수 있다.The speech recognition unit 110 may perform speech recognition on the user's utterance as the user's utterance is detected, and may convert the user utterance into text.

보다 구체적으로, 사용자 발화에 대한 음성 인식은 사용자 단말이 사용자 발화를 이해하여 사용자의 의도에 상응하는 서비스를 제공할 수 있도록 사용자 발화에 따른 음성 신호를 사용자 단말이 다룰 수 있는 문자 정보로 변환할 수 있다.More specifically, speech recognition for user utterance can convert a voice signal resulting from user utterance into character information that can be handled by the user terminal so that the user terminal can understand the user utterance and provide a service corresponding to the user's intention have.

오류 검출부(120)는 사용자 발화를 변환한 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 이용하여 오류 단어를 검출할 수 있다.The error detection unit 120 can detect an erroneous word by using the part-of-speech information and the pronunciation information for each word constituting the text converted from the user utterance.

보다 구체적으로, 오류 검출부(120)는 말뭉치로부터 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 추출하고, 사용자 단말을 운용하는 사용자로부터 사용자 발화에 상응하는 정답 텍스트가 입력됨에 따라 말뭉치(corpus)로부터 정답 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 추출할 수 있다.More specifically, the error detection unit 120 extracts parts-of-speech information and pronunciation information for each word constituting the text from the corpus, and extracts corpus information from the corpus The speech information and pronunciation information for each word constituting the correct answer text can be extracted.

그리하여, 오류 검출부(120)는 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보와 정답 텍스트를 구성하는 단어 각각에 대한 품사 정보 및 발음 정보를 비교함으로써 텍스트에 포함된 오류 단어를 검출할 수 있다.Thus, the error detection unit 120 can detect an error word included in the text by comparing the part-of-speech information and the pronunciation information for each word constituting the text and the part-of-speech information and the pronunciation information for each word constituting the correct text .

제1 오류 수정부(130)는 미리 구축된 단어 배열 모델을 이용하여 텍스트에 포함된 오류 단어를 수정할 수 있다. 이 때, 미리 구축된 단어 배열 모델은 말뭉치에 포함된 적어도 하나의 예제 텍스트로부터 적어도 하나의 단어 배열 패턴과 단어 배열 패턴 각각에 상응하는 발음 정보를 생성하여 기계 학습함으로써 구축된 후 단어 배열 모델(150)에 저장될 수 있다.The first error correction unit 130 may correct an error word included in the text using a pre-built word array model. At this time, the pre-constructed word array model generates pronunciation information corresponding to each of at least one word array pattern and word array pattern from at least one example text included in the corpus, ). &Lt; / RTI >

보다 구체적으로, 제1 오류 수정부(130)는 텍스트로부터 오류 단어가 검출되면 오류 단어를 기준으로 앞 또는 뒤에 위치하는 단어를 포함하여 수정 후보를 추출하고, 수정 후보를 제외한 나머지 단어들은 고정시킬 수 있다. 이 때, 텍스트에서 수정 후보를 추출하는 이유는 오류 단어의 주변에 위치하는 단어 또한 오류일 가능성이 높기 때문이다. 그리하여, 텍스트에서 수정 후보가 포함된 제1 단어 배열 패턴을 추출할 수 있다.More specifically, when the error word is detected from the text, the first error correction unit 130 extracts the correction candidate including the word positioned before or after the error word, and fixes the remaining words except for the correction candidate have. In this case, the reason for extracting the correction candidate from the text is that the word located in the vicinity of the error word is also likely to be an error. Thus, the first word arrangement pattern including the correction candidate in the text can be extracted.

제1 단어 배열 패턴을 기반으로 미리 구축된 단어 배열 모델에서 적어도 하나의 제2 단어 배열 패턴을 추출할 수 있다. 예를 들어, 미리 구축된 단어 배열 모델에서 제1 단어 배열 패턴과 유사한 단어 배열 패턴 또는 제1 단어 배열 패턴에 상응하는 발음 정보와 일치하는 단어 배열 패턴을 검색하여 제2 단어 배열 패턴을 추출할 수 있다.It is possible to extract at least one second word arrangement pattern in a word array model constructed in advance based on the first word arrangement pattern. For example, a second word arrangement pattern can be extracted by searching a word arrangement pattern similar to the first word arrangement pattern or a word arrangement pattern matching the pronunciation information corresponding to the first word arrangement pattern in the pre-built word array model have.

제1 단어 배열 패턴에 상응하는 발음 정보와 제2 단어 배열 패턴에 상응하는 발음 정보를 매칭시켜 유사도 점수를 산출할 수 있다. 여기에서, 유사도 점수는 레벤시타인 거리(Levenshtein distance)를 이용하여 산출할 수 있다. 레벤시타인 거리는 하나의 문자열을 다른 문자열로 바꿀 때 몇 번의 변경이 필요한지를 측정하는 방식으로 오류 단어가 올바른 단어로 수정될 때 필요한 변경 단계를 기반으로 산출될 수 있다.The similarity score can be calculated by matching the pronunciation information corresponding to the first word arrangement pattern and the pronunciation information corresponding to the second word arrangement pattern. Here, the similarity score can be calculated using the Levenshtein distance. The Lebensitian distance can be computed based on the change step needed when the error word is corrected to the correct word by measuring how many changes are needed when changing one string to another.

그리하여, 유사도 점수가 높은 제2 단어 배열 패턴을 이용하여 텍스트에 포함된 오류 단어를 수정할 수 있다.Thus, an error word included in the text can be corrected using a second word arrangement pattern having a high degree of similarity score.

따라서, 제2 오류 수정부를 통해 텍스트를 재차 수정할 수 있다.Therefore, the text can be corrected again through the second error correcting unit.

제2 오류 수정부(140)는 오류 단어가 수정된 텍스트에서 오류 단어가 검출되는 지의 여부를 확인한 후, 수정된 텍스트에서 오류 단어가 검출되면 미리 구축된 재귀 신경망 모델을 기반으로 수정된 텍스트에 포함된 오류 단어를 수정할 수 있다. The second error correcting unit 140 checks whether or not an error word is detected in the text in which the error word is corrected. If an error word is detected in the corrected text, the second error correcting unit 140 includes the error word in the corrected text based on the pre- You can correct the error word.

이 때, 미리 구축된 재귀 신경망 모델은 재귀 신경망 DB(160)에 저장될 수 있다. 여기에서, 미리 구축된 재귀 신경망 모델은 음절 단위 재귀 신경망 모델 및 단어 단위 재귀 신경망 모델을 포함할 수 있다.At this time, the pre-built recursive neural network model can be stored in the recursive neural network DB 160. Here, the pre-built recursive neural network model may include a syllable-based recursive neural network model and a word-based recursive neural network model.

제2 오류 수정부(140)는 오류 단어의 발음 정보를 음절 단위로 산출하고, 음절 단위로 산출된 오류 단어의 발음 정보를 기반으로 음절 단위 재귀 신경망 모델을 이용하여 오류 단어가 수정될 수 있는 적어도 하나의 수정 단어 후보를 생성할 수 있다.The second error correction unit 140 calculates the pronunciation information of the error word in units of syllable and calculates a syllable-unit reflexive neural network model based on pronunciation information of the error word calculated in units of syllables, One modified word candidate can be generated.

음절 단위 재귀 신경망 모델을 이용하여 적어도 하나의 수정 단어 후보가 생성되면 수정 단어 후보 각각을 오류 단어와 교체하여 적어도 하나의 텍스트 후보를 생성한 후, 단어 단위 재귀 신경망 모델을 기반으로 적어도 하나의 텍스트 후보에 대한 평가 점수를 산출하여 평가 점수가 높은 텍스트 후보에 포함된 수정 단어 후보를 이용하여 수정된 텍스트에 포함된 오류 단어를 수정할 수 있다.
When at least one modified word candidate is generated using a syllable unit recursive neural network model, at least one text candidate is generated by replacing each corrected word candidate with an error word, and then at least one text candidate And the error word included in the corrected text can be corrected using the corrected word candidate included in the text candidate having the high evaluation score.

또한, 미리 구축된 단어 배열 모델과 재귀 신경망 모델을 이용하여 음성 인식의 성능을 향상시킴으로써 음성 인식을 활용하는 다양한 시스템에 적용이 용이하며 경제적이고 효율적으로 사용자 발화에 대한 음성 인식의 오류를 수정할 수 있다.
In addition, by improving the performance of speech recognition using a pre-constructed word array model and a recursive neural network model, it is easy to apply to various systems utilizing speech recognition, and it is possible to economically and efficiently correct errors of speech recognition for user utterance .

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the present invention as defined by the following claims It can be understood that

21: 변환된 텍스트 23: 정답 텍스트
25: 오류 단어 100: 음성 인식 오류 수정 장치
110: 음성 인식부 120: 오류 검출부
130: 제1 오류 수정부 140: 제2 오류 수정부
150: 단어 배열 DB 160: 재귀 신경망 DB21: Transformed text 23: Correct text
25: error word 100: speech recognition error correction device
110: voice recognition unit 120: error detection unit
130: first error correction unit 140: second error correction unit
150: Word array DB 160: Recursive neural network DB

Claims

A method for correcting a speech recognition error in a user terminal capable of digital signal processing,
Performing speech recognition on the user utterance in response to detecting a user utterance to convert the user utterance into text;
Detecting an erroneous word included in the text by using part of speech information and pronunciation information of each word constituting the text;
A first word arrangement pattern including the erroneous word is generated from the text, and the first word arrangement pattern is matched with a second word arrangement pattern extracted from a pre-constructed word arrangement model to generate an error word included in the text Correcting; And
And correcting an error word included in the modified text based on a pre-built recursive neural network model when an error word is detected in the corrected text.

The method according to claim 1,
Wherein the step of detecting an erroneous word included in the text comprises:
Extracting part of speech information and pronunciation information for each word constituting the text from a corpus;
Extracting parts-of-speech information and pronunciation information for each word constituting the correct answer text from the corpus as the correct answer text corresponding to the user utterance is input from a user operating the user terminal; And
And comparing the part-of-speech information and the pronunciation information of each word constituting the text with the parts-of-speech information and the pronunciation information of each of the words constituting the correct-answer text, and detecting an error word included in the text How to Correct Speech Recognition Errors.

The method according to claim 1,
Wherein the correcting the erroneous words included in the text comprises:
Extracting a correction candidate including a word located before or after the error word based on the error word, and generating a first word arrangement pattern including the correction candidate in the text;
And extracting at least one second word arrangement pattern from the pre-constructed word arrangement model based on the first word arrangement pattern.

The method of claim 3,
Wherein the correcting the erroneous words included in the text comprises:
The similarity degree score is calculated by matching the pronunciation information corresponding to the first word arrangement pattern and the pronunciation information corresponding to each of the at least one second word arrangement pattern, and using the second word arrangement pattern having a high degree of similarity score, And correcting the error word included in the speech recognition error.

The method of claim 3,
Wherein the pre-built word array model comprises:
Wherein at least one word arrangement pattern and at least one pronunciation information corresponding to each word arrangement pattern are generated from at least one sample text included in a corpus and constructed by machine learning.

The method according to claim 1,
The pre-built recursive neural network model comprises:
A syllable unit recursive neural network model constructed by mechanically learning phonetic information or context information for at least one word included in a corpus, and pronunciation information or context information for at least one word included in the corpus, And a word-based recursive neural network model constructed by learning.

The method of claim 6,
Wherein the modifying the erroneous word included in the modified text comprises:
Wherein the phonetic unit of the syllable unit recursive neural network model calculates phonetic information of the erroneous word in accordance with detection of an erroneous word in the corrected text of the erroneous word, And generating at least one corrected word candidate to which the error word can be corrected.

The method of claim 7,
Wherein the modifying the erroneous word included in the modified text comprises:
Generating at least one text candidate including each of the at least one corrected word candidate,
Calculating an evaluation score for the at least one text candidate based on the word-based recursive neural network model, and correcting the error word included in the corrected text using the corrected word candidate included in the text candidate having a high evaluation score A method for correcting a speech recognition error.

1. A speech recognition error correction apparatus implemented in a user terminal capable of digital signal processing,
A speech recognition unit for performing speech recognition on the user utterance and converting the user utterance into text as the user utterance is detected;
An error detection unit for detecting an erroneous word included in the text by using part of speech information and pronunciation information for each word constituting the text;
A first word arrangement pattern including the erroneous word is generated from the text, and the first word arrangement pattern is matched with a second word arrangement pattern extracted from a pre-constructed word arrangement model to generate an error word included in the text A first error correcting unit for correcting the error; And
And a second error correcting unit for correcting an error word included in the corrected text based on a pre-built recursive neural network model when an error word is detected in the corrected text.

The method of claim 9,
Wherein,
Extracting part of speech information and pronunciation information for each word constituting the text from a corpus,
Extracting parts-of-speech information and pronunciation information for each word constituting the correct text from the corpus as the correct answer text corresponding to the user utterance is input from a user operating the user terminal,
Wherein the speech recognition unit detects an error word included in the text by comparing parts of speech information and pronunciation information of each word constituting the text with parts of speech information and pronunciation information of words constituting the correct text, Correction device.

The method of claim 9,
Wherein the first error correction unit comprises:
Extracting a correction candidate including a word located before or after the error word based on the error word, generating a first word arrangement pattern including the correction candidate in the text,
And extracts at least one second word arrangement pattern from the pre-constructed word arrangement model based on the first word arrangement pattern.

The method of claim 11,
Wherein the first error correction unit comprises:
The similarity degree score is calculated by matching the pronunciation information corresponding to the first word arrangement pattern and the pronunciation information corresponding to each of the at least one second word arrangement pattern, and using the second word arrangement pattern having a high degree of similarity score, And corrects the error word included in the speech recognition error correction unit.

The method of claim 9,
Wherein the second error correction unit comprises:
The pronunciation information of the error word is calculated in units of syllable according to the detection of the error word in the corrected text of the error word, And generates at least one corrected word candidate to which an error word can be corrected.

14. The method of claim 13,
Wherein the second error correction unit comprises:
Generating at least one text candidate including each of the at least one corrected word candidate,
Calculating an evaluation score for the at least one text candidate based on the pre-established recursive neural network model, and correcting the error word included in the corrected text using the corrected word candidate included in the text candidate having a high evaluation score Wherein the speech recognition error correcting device comprises:

The method of claim 9,
The speech recognition error correction device comprises:
At least one word arrangement pattern from at least one example text included in a corpus and a word array for storing the pre-built word array model constructed by machine learning by generating pronunciation information corresponding to each word arrangement pattern, DB; And
A syllable unit recursive neural network model constructed by mechanically learning phonetic information or context information for at least one word included in a corpus, and pronunciation information or context information for at least one word included in the corpus, Further comprising a recursive neural network (DB) storing the pre-constructed recursive neural network model including a word-based recursive neural network model constructed by learning.