KR20230064304A

KR20230064304A - Apparatus for automatic labeling and method for self-learning of language model using the same

Info

Publication number: KR20230064304A
Application number: KR1020210149749A
Authority: KR
Inventors: 이주형
Original assignee: 주식회사 케이티
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2023-05-10

Abstract

자동 레이블링 장치가 발화 문장에 레이블링하는 방법으로서, 발화 의도가 레이블링된 학습 데이터들을 입력 받아, 각 학습 데이터들에 포함된 학습 문장으로부터 상기 발화 의도가 출력하도록 분류기를 학습시킨다. 학습된 분류기에 학습 데이터를 입력하여, 학습 문장에 대한 키워드 숙어집을 생성하고, 발화 의도가 레이블링되지 않은 발화 문장을 수신하면, 학습된 분류기를 이용하여 발화 문장에 대한 복수의 후보 레이블들과 발화 문장으로부터 생성한 키워드 숙어집을 이용하여, 발화 문장의 발화 의도를 출력한다.
이때, 분류기는 학습 데이터와 발화 문장으로부터 발화 의도와 키워드를 추출하도록 학습된 언어 모델, 그리고 언어 모델이 추출한 키워드를 입력 받아, 학습 데이터와 발화 문장으로부터 키워드에 대한 복수의 연관 단어들을 추출하여, 키워드와 함께 키워드 숙어집을 생성하는 어텐션 메커니즘을 포함한다.As a method of labeling spoken sentences by an automatic labeling apparatus, a classifier is trained to receive training data labeled with speech intentions and output the speech intentions from the learning sentences included in the respective training data. When learning data is input to the learned classifier, a keyword phrasebook for the learning sentence is generated, and a speech sentence in which the speech intent is not labeled is received, a plurality of candidate labels for the speech sentence and the speech sentence are obtained using the learned classifier. The utterance intention of the uttered sentence is output using the keyword phrasebook generated from .
At this time, the classifier receives a language model learned to extract speech intention and keywords from the learning data and spoken sentences, and the keywords extracted by the language model, extracts a plurality of words related to the keyword from the learning data and spoken sentences, and It also includes an attention mechanism that generates keyword phrasebooks.

Description

Automatic labeling device and method for labeling spoken sentences using the same {Apparatus for automatic labeling and method for self-learning of language model using the same}

본 발명은 레이블링되지 않은 문장들을 자동으로 레이블링하고 준지도 학습의 의사 라벨링(pseudo labeling)의 정확도를 향상시키기 위해 딥러닝 모델을 학습시키는 자동 레이블링 장치 및 이를 이용한 발화 문장의 레이블링 방법에 관한 것이다.The present invention relates to an automatic labeling apparatus for automatically labeling unlabeled sentences and learning a deep learning model to improve the accuracy of pseudo labeling of semi-supervised learning, and a labeling method for spoken sentences using the same.

최근 자연어 처리 분야에서, 텍스트 분류 문제를 해결하려는 연구가 활발하게 이루어지고 있다. Recently, in the field of natural language processing, research to solve text classification problems has been actively conducted.

BERT(Bidirectional Encoder Representations from Transformers)와 같은 언어 모델은 국내외 기업에서 다양한 서비스를 위해 사용되고 있다. 방대한 양의 레이블링 되지 않은 비정형 데이터를 활용하여 BERT와 같은 대용량 모델로, 가벼운 학습을 돕는 선생님-학생 학습(teacher-student learning) 방식으로 딥러닝 모델의 학습 데이터를 구축하기 위한 연구들이 많은 관심을 받고 있다. Language models such as BERT (Bidirectional Encoder Representations from Transformers) are used by domestic and foreign companies for various services. Research to build deep learning model learning data using a teacher-student learning method that helps light learning with a large-capacity model such as BERT using a large amount of unlabeled unstructured data is receiving a lot of attention. there is.

텍스트 분류 시 비정형 데이터들을 레이블링하는 데 많은 시간과 비용이 발생한다. 그리고 학습을 위한 학습 데이터들이 충분히 확보되지 않기 때문에, 제한된 학습 데이터 문제를 해결하기 위해 준 지도 학습 및 데이터 증강 기법 등의 관련 기술들이 연구되고 있다. 대표적으로는 소량의 학습 데이터로 훈련시킨 딥러닝 모델을 사용하여, 레이블링 되어있지 않은 데이터에 자동으로 레이블을 지정하여 데이터를 확장시키는 의사 라벨링(Pseudo labeling) 기법이 있다. In text classification, labeling of unstructured data takes a lot of time and cost. In addition, since sufficient learning data for learning is not secured, related technologies such as quasi-supervised learning and data augmentation techniques are being studied to solve the limited learning data problem. Representatively, there is a pseudo labeling technique that expands data by automatically labeling unlabeled data using a deep learning model trained with a small amount of training data.

의사 라벨링 기법은 소량의 학습 데이터로 학습된 딥러닝 모델의 예측값에만 편향되기 때문에, 의사 라벨링 과정에서 오류가 발생할 경우 모델의 성능이 저하되는 문제점이 있다. 특히, 학습 데이터가 단문이거나 레이블이 100개 이상으로 많은 경우와 같이 데이터 분포가 매우 불균형할 경우, 의사 라벨링 과정에서 더 많은 오류가 발생하여 모델의 성능이 매우 저하되는 문제점이 있다.Since the pseudo-labeling technique is biased only to the predicted value of the deep learning model trained with a small amount of training data, there is a problem in that the performance of the model deteriorates when an error occurs in the pseudo-labeling process. In particular, when the data distribution is very imbalanced, such as when the training data is short or the number of labels is 100 or more, there is a problem in that more errors occur in the pseudo-labeling process and the performance of the model deteriorates significantly.

따라서, 본 발명은 딥러닝 모델을 통해 계산된 셀프 어텐션 스코어와 어텐션 매커니즘으로 산출한 외부 어텐션 스코어로 생성한 키워드 숙어집, 그리고 언어 모델을 통해 예측한 상위 복수개의 모델 예측값들을 활용하여, 레이블되지 않은 방대한 양의 데이터에 자동으로 레이블을 지정하고, 학습 데이터로 언어 모델을 자가 학습시키는 자동 레이블링 장치 및 이를 이용한 발화 문장의 레이블링 방법을 제공한다.Therefore, the present invention utilizes a self-attention score calculated through a deep learning model, a keyword phrasebook generated by an external attention score calculated by an attention mechanism, and a plurality of top model predictions predicted through a language model, and uses a large number of unlabeled An automatic labeling device for automatically labeling positive data and self-learning a language model with training data and a method for labeling spoken sentences using the same are provided.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 하나의 특징인 자동 레이블링 장치가 발화 문장에 레이블링하는 방법으로서, As a method of labeling a spoken sentence by an automatic labeling device, which is one feature of the present invention for achieving the technical problem of the present invention,

발화 의도가 레이블링된 학습 데이터들을 입력 받아, 각 학습 데이터들에 포함된 학습 문장으로부터 상기 발화 의도가 출력하도록 분류기를 학습시키는 단계, 상기 학습된 분류기에 상기 학습 데이터를 입력하여, 상기 학습 문장에 대한 키워드 숙어집을 생성하는 단계, 그리고 발화 의도가 레이블링되지 않은 발화 문장을 수신하고, 학습된 분류기를 이용하여 상기 발화 문장에 대한 복수의 후보 레이블들과 상기 발화 문장으로부터 생성한 키워드 숙어집을 이용하여, 상기 발화 문장의 발화 의도를 출력하는 단계를 포함하고, 상기 분류기는, 상기 학습 데이터와 상기 발화 문장으로부터 발화 의도와 키워드를 추출하도록 학습된 언어 모델, 그리고 상기 언어 모델이 추출한 키워드를 입력 받아, 상기 학습 데이터와 상기 발화 문장으로부터 상기 키워드에 대한 복수의 연관 단어들을 추출하여, 상기 키워드와 함께 키워드 숙어집을 생성하는 어텐션 메커니즘을 포함한다.learning a classifier to receive learning data labeled with speech intention and output the speech intention from a learning sentence included in each of the learning data; Generating a keyword phrasebook, receiving a spoken sentence whose speech intent is not labeled, and using a plurality of candidate labels for the spoken sentence and a keyword phrasebook generated from the spoken sentence using a learned classifier, and outputting a speech intent of a spoken sentence, wherein the classifier receives a language model learned to extract a speech intent and a keyword from the learning data and the spoken sentence, and the keyword extracted by the language model, and performs the learning and an attention mechanism that extracts a plurality of words related to the keyword from data and the spoken sentence and generates a keyword phrasebook together with the keyword.

상기 발화 의도를 출력하는 단계는, 상기 발화 문장에서 생성한 복수의 후보 레이블들 중 예측 신뢰도가 가장 높은 제1 후보 레이블과 미리 설정한 임계값을 비교하는 단계, 그리고 상기 제1 후보 레이블의 예측 신뢰도가 상기 임계값보다 크면, 상기 제1 후보 레이블을 상기 발화 문장의 정답으로 출력하여 상기 발화 문장에 정답 레이블로 설정하는 단계를 포함할 수 있다.The outputting of the utterance intention may include comparing a first candidate label having the highest prediction reliability among a plurality of candidate labels generated from the utterance sentence with a preset threshold, and the prediction reliability of the first candidate label. and outputting the first candidate label as a correct answer of the spoken sentence and setting the first candidate label as a correct answer label for the spoken sentence when is greater than the threshold value.

상기 제1 후보 레이블의 예측 신뢰도가 상기 임계값보다 작으면, 상기 발화 문장에서 생성된 키워드 숙어집을 기초로, 상기 제1 후보 레이블을 제외한 복수의 후보 레이블들 중 어느 하나의 후보 레이블들을 정답 레이블로 설정하는 단계를 더 포함할 수 있다.If the prediction reliability of the first candidate label is less than the threshold value, any one of the plurality of candidate labels excluding the first candidate label is set as the correct answer label based on the keyword phrasebook generated from the spoken sentence. A setting step may be further included.

상기 발화 의도를 출력하는 단계는, 상기 언어 모델에 포함된 셀프 어텐션 메커니즘이 상기 발화 문장에서 셀프 어텐션 스코어가 가장 높게 계산된 단어를 상기 발화 문장의 키워드로 추출하는 단계, 상기 어텐션 메커니즘이 상기 셀프 어텐션 메커니즘에서 추출한 키워드에 연관된 적어도 하나의 연관 단어를 추출하는 단계, 그리고 상기 키워드와 상기 적어도 하나의 연관 단어를 포함하는 상기 키워드 숙어집을 생성하는 단계를 포함할 수 있다.The outputting of the speech intention may include extracting, as a keyword of the spoken sentence, a word for which the self-attention score is highest in the spoken sentence by a self-attention mechanism included in the language model; The method may include extracting at least one related word related to the keyword extracted by the mechanism, and generating the keyword phrasebook including the keyword and the at least one related word.

상기 발화 의도를 출력하는 단계 이후에, 상기 정답 레이블이 설정된 발화 문장을 이용하여 상기 분류기를 재 학습시키는 단계를 포함할 수 있다.After outputting the utterance intention, re-learning the classifier using a utterance sentence for which the correct answer label is set may be included.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 또 다른 특징인 발화 문장에 발화 의도를 레이블링하는 자동 레이블링 장치로서,As an automatic labeling device for labeling a speech intention in a spoken sentence, which is another feature of the present invention for achieving the technical problem of the present invention,

적어도 하나의 명령어를 포함하고, 레이블이 지정되어 있는 학습 데이터들을 저장하는 메모리, 발화자가 발화한 레이블링이 지정되지 않은 발화 문장을 수신하는 인터페이스, 그리고A memory containing at least one instruction and storing labeled training data, an interface for receiving an unlabeled utterance sentence uttered by a speaker, and

프로세서를 포함하고, 상기 프로세서는, 상기 학습 데이터로부터 학습 문장과 상기 학습 문장에 대한 발화 의도를 추출하고, 상기 학습 문장과 발화 의도를 이용하여 분류기를 학습시키고, 상기 학습된 분류기에서 상기 학습 문장에 대한 키워드 숙어집을 생성하며, 학습된 분류기로 상기 발화 문장을 입력하여 상기 발화 문장에 대한 의도를 상기 발화 문장에 레이블링한다.and a processor, wherein the processor extracts a learning sentence and an utterance intent for the learning sentence from the training data, uses the learning sentence and the utterance intent to train a classifier, and uses the learned classifier to determine the learning sentence. A phrasebook of keywords is created, and the spoken sentence is input into the learned classifier to label the spoken sentence with an intention for the spoken sentence.

상기 프로세서는, 상기 발화 문장으로부터 발화 의도의 예측 신뢰도를 계산하여, 복수의 후보 레이블들을 추출하고, 추출한 후보 레이블들 중 예측 신뢰도가 가장 높은 제1 후보 레이블과 미리 설정한 임계값을 비교하여 상기 제1 후보 레이블이 정답 레이블인지 판단할 수 있다.The processor calculates prediction reliability of speech intent from the spoken sentence, extracts a plurality of candidate labels, compares a first candidate label having the highest prediction reliability among the extracted candidate labels with a preset threshold, and compares the first candidate label with a preset threshold. 1 It can be determined whether the candidate label is the correct label.

상기 프로세서는, 상기 제1 후보 레이블의 예측 신뢰도가 상기 임계값보다 크면, 상기 제1 후보 레이블을 상기 발화 문장의 정답으로 출력하여 상기 발화 문장에 정답 레이블로 설정하고,The processor, when the prediction reliability of the first candidate label is greater than the threshold value, outputs the first candidate label as a correct answer of the spoken sentence and sets it as a correct answer label for the spoken sentence;

상기 제1 후보 레이블의 예측 신뢰도가 상기 임계값보다 작으면, 상기 발화 문장에서 생성된 키워드 숙어집을 기초로, 상기 제1 후보 레이블을 제외한 복수의 후보 레이블들 중 어느 하나의 후보 레이블들을 정답 레이블로 설정할 수 있다.If the prediction reliability of the first candidate label is less than the threshold value, any one of the plurality of candidate labels excluding the first candidate label is set as the correct answer label based on the keyword phrasebook generated from the spoken sentence. can be set

상기 프로세서는, 상기 발화 문장에서 셀프 어텐션 스코어가 가장 높게 계산된 단어를 상기 발화 문장의 키워드로 추출하고, 상기 키워드에 연관된 적어도 하나의 연관 단어를 추출하며, 상기 키워드와 상기 적어도 하나의 연관 단어를 포함하는 상기 키워드 숙어집을 생성할 수 있다. The processor extracts a word for which the self-attention score is highest in the spoken sentence as a keyword of the spoken sentence, extracts at least one related word related to the keyword, and extracts the keyword and the at least one related word. It is possible to generate the keyword phrasebook including.

본 발명에 따르면, 딥러닝 모델의 셀프 어텐션 스코어와 어텐션 매커니즘을 통해 계산한 어텐션 스코어를 기초로 문장에서 추출한 키워드에 대한 키워드 숙어집, 그리고 언어 모델을 통해 예측한 상위 복수개의 모델 예측값들을 활용하여, 언어 모델이 자가 학습을 진행할 수 있도록 제공함으로써, 언어 모델의 예측 성능을 향상시킬 수 있다. According to the present invention, based on the self-attention score of the deep learning model and the attention score calculated through the attention mechanism, a phrasebook for keywords extracted from sentences and a plurality of upper model predicted values predicted through a language model are used, By providing the model to proceed with self-learning, it is possible to improve the prediction performance of the language model.

또한, 극 소량의 학습 데이터와 대량의 비정형 데이터를 사용하여 자동으로 데이터들을 레이블링할 수 있으므로, 용이하게 학습 데이터를 구축할 수 있다.In addition, since data can be automatically labeled using a very small amount of training data and a large amount of unstructured data, learning data can be easily constructed.

도 1은 본 발명의 실시예에 따른 자동 레이블링 장치의 구조도이다.
도 2는 본 발명의 실시예에 따라 자동 레이블링 장치가 발화 문장을 레이블링하는 방법의 흐름도이다.
도 3은 본 발명의 실시예에 따라 생성된 키워드 숙어집의 예시도이다.
도 4는 일반적인 분류기의 학습 절차에 대한 예시도이다.
도 5는 본 발명의 실시예에 따른 분류기의 학습 절차에 대한 예시도이다.
도 6은 본 발명의 실시예에 따른 컴퓨터 시스템의 구조도이다.1 is a structural diagram of an automatic labeling device according to an embodiment of the present invention.
2 is a flowchart of a method of labeling a spoken sentence by an automatic labeling apparatus according to an embodiment of the present invention.
3 is an exemplary view of a keyword phrasebook generated according to an embodiment of the present invention.
4 is an exemplary view of a learning procedure of a general classifier.
5 is an exemplary view of a learning procedure of a classifier according to an embodiment of the present invention.
6 is a structural diagram of a computer system according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a certain component is said to "include", it means that it may further include other components without excluding other components unless otherwise stated.

이하, 도면을 참조로 하여 본 발명의 실시예에 따른 자동 레이블링 장치 및 이를 이용한 발화 문장의 레이블링 방법에 대해 상세히 설명한다. Hereinafter, an automatic labeling apparatus according to an embodiment of the present invention and a method for labeling spoken sentences using the same will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 자동 레이블링 장치의 구조도이다.1 is a structural diagram of an automatic labeling device according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 자동 레이블링 장치(100)는 BERT와 같은 언어 모델(110)과 어텐션 메커니즘(120)으로 구성된 분류기를 포함한다.As shown in FIG. 1 , the automatic labeling apparatus 100 includes a classifier composed of a language model 110 such as BERT and an attention mechanism 120 .

언어 모델(110)은 레이블링 된 소량의 학습 데이터가 입력되면, 학습 데이터에 설정된 레이블이 정답으로 출력한다. 또한, 언어 모델(110)은 레이블링되지 않은 데이터가 입력되면, 해당 데이터에 정답인 레이블을 찾아 자동으로 레이블링하여 레이블링된 데이터로 출력한다. When a small amount of labeled training data is input to the language model 110, a label set in the training data is output as a correct answer. In addition, when unlabeled data is input, the language model 110 finds a correct label for the corresponding data, automatically labels it, and outputs the labeled data.

즉, 언어 모델(110)은 발화자가 발화한 문장을 데이터로 입력받고, 해당 문장에 대한 발화 의도별 신뢰도를 예측한다. 언어 모델(110)이 문장에서 발화 의도별 신뢰도를 예측하는 방법은 이미 알려진 기술로, 본 발명의 실시예에서는 상세한 설명을 생략한다. That is, the language model 110 receives a sentence uttered by a speaker as data, and predicts the reliability of the corresponding sentence for each utterance intention. A method in which the language model 110 predicts reliability for each utterance intent in a sentence is a known technique, and a detailed description thereof is omitted in an embodiment of the present invention.

그리고 언어 모델(110)는 레이블링된 학습 데이터를 예측 신뢰도에 따라 정답 레이블을 출력한다. 또한, 언어 모델(110)은 레이블링되지 않은 대용량 데이터에 자동으로 레이블을 지정하여 레이블링된 데이터로 출력한다. The language model 110 outputs a correct answer label according to the prediction reliability of the labeled training data. In addition, the language model 110 automatically labels unlabeled bulk data and outputs the labeled data.

이를 위해, 언어 모델(110)는 레이블이 설정되어 있는 소량의 학습 데이터를 사용하여, 입력된 문장에서 예측 신뢰도 값이 가장 높은 레이블이 출력되도록 학습된다. To this end, the language model 110 is trained to output a label having the highest prediction reliability value in an input sentence using a small amount of training data in which labels are set.

이때, 언어 모델(110)은 입력된 문장에서 예측 신뢰도 값이 상위 n개로 계산된 레이블들을 후보 레이블들로 추출한다. 본 발명의 실시예에서는 상위 5개(top 1 ~ top 5)의 레이블을 후보 레이블들로 추출하는 것을 예로 하여 설명한다.At this time, the language model 110 extracts, as candidate labels, labels whose prediction reliability values are calculated as the top n values in the input sentence. In an embodiment of the present invention, extracting top 5 labels (top 1 to top 5) as candidate labels will be described as an example.

언어 모델(110)은 이후 설명할 어텐션 메커니즘(120)을 통해 출력된 키워드 숙어집, 키워드 숙어집이 추출된 입력 데이터, 그리고 후보 레이블들을 이용하여, 입력된 데이터에서 정답을 찾도록 다시 학습한다. 그리고 다시 학습된 언어 모델(110)은, 레이블이 설정되지 않은 대용량 데이터에 자동으로 레이블을 지정한다. The language model 110 uses the keyword phrasebook output through the attention mechanism 120 to be described later, input data from which the keyword phrasebook is extracted, and candidate labels, and learns again to find the correct answer from the input data. Then, the retrained language model 110 automatically labels large-volume data for which no label has been set.

이러한 언어 모델(110)은 내부에 셀프 어텐션 메커니즘(111)이 포함되어 있다. 셀프 어텐션 메커니즘(111)은 레이블링된 학습 데이터가 입력되면, 입력된 학습 데이터를 대표하는 대표 단어를 키워드로 추출한다. 셀프 어텐션 메커니즘(111)이 입력된 문장에서 대표 키워드를 추출하는 방법은 이미 알려진 기술로, 본 발명의 실시예에서는 어느 하나의 방법으로 한정하지 않는다.The language model 110 includes a self-attention mechanism 111 therein. When labeled training data is input, the self-attention mechanism 111 extracts a representative word representing the input training data as a keyword. A method of extracting a representative keyword from an input sentence by the self-attention mechanism 111 is a known technique, and is not limited to any one method in an embodiment of the present invention.

어텐션 메커니즘(120)은 셀프 어텐션 메커니즘(111)에서 추출한 키워드, 키워드가 추출된 레이블링된 학습 데이터를 이용하여, 키워드와 연관된 적어도 하나의 연관 단어들을 추출한다. 어텐션 메커니즘(120)은 키워드와 연관된 적어도 하나의 연관 단어들을 추출하기 위해 다양한 방법을 이용할 수 있으므로, 본 발명의 실시예에서는 어느 하나의 방법으로 한정하지 않는다.The attention mechanism 120 extracts at least one related word associated with the keyword using the keyword extracted by the self-attention mechanism 111 and the labeled training data from which the keyword is extracted. Since the attention mechanism 120 may use various methods to extract at least one related word associated with a keyword, the present embodiment is not limited to any one method.

그리고, 어텐션 메커니즘(120)은 키워드와 추출한 적어도 하나의 연관 단어들을 숙어로 표현하여, 키워드 숙어집을 생성한다. 이는, 연관 단어들이 중의적 표현으로 사용될 경우 발생할 수 있는 레이블 지정 오류를 해소하기 위함이다.Then, the attention mechanism 120 creates a keyword phrasebook by expressing the keyword and at least one extracted related word as an idiom. This is to solve labeling errors that may occur when related words are used in ambiguous expressions.

어텐션 메커니즘(120)이 대표 키워드와 관련된 연관 단어들을 추출하는 방법인 이미 알려진 기술로, 본 발명의 실시예에서는 어느 하나의 방법으로 한정하지 않는다. 그리고, 어텐션 메커니즘(120)은 "[키워드, 연관 단어]"의 순서로 단어들을 나열하여 숙어로 생성하고, 생성한 적어도 하나의 숙어들을 키워드 숙어집으로 생성하는 방법은 다양한 방법으로 생성할 수 있다. 그리고, 연관 단어에 연관된 또 다른 연관 단어가 있다면, 어텐션 메커니즘(120)은 [키워드, 연관 단어 1, 연관 단어 2]와 같은 형태로 키워드 숙어집을 생성할 수도 있다.This is a known technique in which the attention mechanism 120 extracts related words related to a representative keyword, and is not limited to any one method in an embodiment of the present invention. In addition, the attention mechanism 120 arranges the words in the order of “[keyword, related word]” to generate an idiom, and the generated at least one idiom can be generated as a keyword phrasebook in various ways. And, if there is another related word related to the related word, the attention mechanism 120 may generate a keyword phrasebook in the form of [keyword, related word 1, related word 2].

다음은, 자동 레이블링 장치(100)가 레이블링되지 않은 문장을 자동으로 레이블링하는 방법에 대해 도 2를 참조로 설명한다.Next, a method of automatically labeling unlabeled sentences by the automatic labeling apparatus 100 will be described with reference to FIG. 2 .

도 2는 본 발명의 실시예에 따라 자동 레이블링 장치가 발화 문장을 레이블링하는 방법의 흐름도이다.2 is a flowchart of a method of labeling a spoken sentence by an automatic labeling apparatus according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 자동 레이블링 장치(100)로 레이블링 된 학습 데이터가 입력되면, 분류기는 레이블링된 학습 데이터로 언어 모델을 학습시킨다(S100). 즉, 학습 데이터에 해당하는 문장이 입력되면, 해당 문장의 레이블로 설정된 정답이 출력되도록 언어 모델을 학습시킨다.As shown in FIG. 2 , when labeled training data is input to the automatic labeling apparatus 100, the classifier trains a language model with the labeled training data (S100). That is, when a sentence corresponding to training data is input, the language model is trained to output a correct answer set as a label of the corresponding sentence.

분류기는 이렇게 학습시킨 언어 모델을 이용하여, 학습 데이터에 대한 키워드 숙어집을 생성한다(S110). 즉, 언어 모델(110)에 포함된 셀프 어텐션 메커니즘(111)에서 학습 데이터에서 추출한 키워드, 그리고 어텐션 메커니즘(120)에서 해당 키워드에 대해 추출한 연관 단어를 포함하여 키워드 숙어집으로 생성한다. The classifier generates a keyword phrasebook for the training data using the language model trained in this way (S110). That is, a keyword phrasebook is created by including the keywords extracted from the learning data in the self-attention mechanism 111 included in the language model 110 and the related words extracted for the corresponding keywords in the attention mechanism 120.

여기서, 셀프 어텐션 메커니즘(111)은 학습 데이터에 해당하는 문장에서 신뢰도가 가장 높은 값을 가지는 단어를 키워드로 추출한다. 그리고, 어텐션 메커니즘(120)은 셀프 어텐션 메커니즘(111)이 추출한 키워드와 관련된 연관 단어를 해당 문장에서 의도 수만큼 추출한다.Here, the self-attention mechanism 111 extracts, as a keyword, a word having the highest reliability value in a sentence corresponding to the learning data. In addition, the attention mechanism 120 extracts as many related words as the number of intentions related to the keyword extracted by the self-attention mechanism 111 from the corresponding sentence.

예를 들어, 학습 데이터로 "핸드폰 요금 알려주세요"가 입력되었다고 가정한다. 이때, 학습 데이터의 정답 의도로 "무선 요금 문의"가 설정되어 있다고 가정한다. For example, it is assumed that "Tell me the mobile phone bill" is input as training data. At this time, it is assumed that "wireless charge inquiry" is set as the correct answer intention of the learning data.

언어 모델(110)의 셀프 어텐션 메커니즘(111)은 "핸드폰 요금 알려주세요"에서 셀프 어텐션 스코어가 가장 높은 값을 가지는 "핸드폰"을 키워드로 추출한다. 그리고, 어텐션 메커니즘(120)은 키워드 "핸드폰"에 연관된 연관 단어로 "요금"과 "알려"를 추출한다. 따라서, 어텐션 메커니즘(120)은 [휴대폰, 요금, 알려]를 포함하는 키워드 숙어집을 생성한다.The self-attention mechanism 111 of the language model 110 extracts "mobile phone" having the highest self-attention score in "tell me the mobile phone bill" as a keyword. Then, the attention mechanism 120 extracts "fee" and "inform" as related words related to the keyword "mobile phone". Accordingly, the attention mechanism 120 generates a keyword phrasebook including [mobile phone, fee, inform].

이와 같이 언어 모델(110)이 레이블링 된 학습 데이터로 학습하거나 학습 데이터에 대한 키워드 숙어집을 생성한 후에, 발화자가 임의의 문장 즉, 발화 문장을 발화하여 발화 문장을 수신하였다고 가정한다(S120). 이때, 발화 문장은 레이블이 설정되어 있지 않다고 가정한다.In this way, it is assumed that after the language model 110 learns with the labeled training data or generates a keyword phrasebook for the training data, the speaker utters an arbitrary sentence, that is, a spoken sentence and receives the spoken sentence (S120). At this time, it is assumed that the spoken sentence does not have a label set.

학습 데이터를 이용하여 기 학습된 언어 모델(110)은 발화 문장에서 예측 신뢰도를 계산하여 복수개의 후보 레이블들을 추출하고, 발화 문장에 대한 키워드 숙어집을 생성한다(S130). 본 발명의 실시예에서는 5개의 후보 레이블(Top 1 ~ Top 5)들을 추출하는 것을 예로 하여 설명한다.The language model 110 pre-learned using the learning data calculates prediction reliability from the spoken sentence, extracts a plurality of candidate labels, and generates a keyword phrasebook for the spoken sentence (S130). In an embodiment of the present invention, extraction of five candidate labels (Top 1 to Top 5) will be described as an example.

언어 모델(110)은 발화 문장으로부터 예측 신뢰도 값을 계산하는 과정에서, 제1 후보 레이블로 예측한 레이블의 예측 신뢰도와 미리 설정된 임계값을 비교한다(S140, S150). 만약 제1 후보 레이블의 예측 신뢰도가 임계값 보다 크다면, 언어 모델(110)은 제1 후보 레이블을 발화 문장의 정답 레이블로 설정한다(S160).The language model 110 compares the prediction reliability of the label predicted by the first candidate label with a preset threshold in the process of calculating the prediction reliability value from the spoken sentence (S140 and S150). If the prediction reliability of the first candidate label is greater than the threshold value, the language model 110 sets the first candidate label as the correct answer label of the spoken sentence (S160).

그러나, 제1 후보 레이블의 예측 신뢰도가 임계값 보다 작다면, 언어 모델(110)은 언어 모델(110)은 S130 단계에서 생성한 키워드 숙어집을 이용하여, 추출한 복수의 후보 레이블들 중 다시 제1 후보 레이블들에서 제5 후보 레이블들에서 정답 레이블을 찾는다(S170). 즉, 언어 모델(110)이 예측한 제1 후보 레이블의 예측 신뢰도가 임계값 보다 작은 경우, 제1 후보 레이블들에서부터 제5 후보 레이블들까지의 숙어집에서 발화 문장에 따른 숙어를 다시 검색한다.However, if the prediction reliability of the first candidate label is less than the threshold value, the language model 110 uses the keyword phrasebook generated in step S130 to return to the first candidate among the plurality of candidate labels extracted. A correct answer label is searched from the fifth candidate labels among the labels (S170). That is, when the prediction reliability of the first candidate label predicted by the language model 110 is less than the threshold value, the idiom according to the spoken sentence is searched again from the idiom books from the first candidate labels to the fifth candidate labels.

이와 같이 정답 레이블을 찾은 후, 정답 레이블이 설정된 발화 문장을 이용하여 분류기를 재 학습시킨다.After finding the correct answer label in this way, the classifier is retrained using the utterance sentence in which the correct answer label is set.

상술한 절차에 따라 언어 모델(110)이 레이블링되지 않은 대용량 데이터에 레이블을 설정하는 과정에서, S110 단계 또는 S130 단계에서 키워드 숙어집을 생성하는 예에 대해 도 3을 참조로 설명한다.An example of generating a keyword phrasebook in step S110 or step S130 in the process of setting labels on large amount of unlabeled data by the language model 110 according to the above-described procedure will be described with reference to FIG. 3 .

도 3은 본 발명의 실시예에 따라 생성된 키워드 숙어집의 예시도이다.3 is an exemplary view of a keyword phrasebook generated according to an embodiment of the present invention.

언어 모델(110) 즉, BERT와 결합한 어텐션 메커니즘은 클래스를 분류하기 위해 핵심이 되는 단어에 높은 스코어를 부여하며 학습을 진행한다. 학습된 어텐션 메커니즘을 사용하여 언어 모델(110)은 "지금 휴대폰 요금 알고싶어”와 같은 발화 문장을 예측하면, 레이블을 예측하는 데에 핵심이 되는 단어에 높은 어텐션 스코어를 부여한다.The language model 110, that is, the attention mechanism combined with BERT assigns a high score to a key word to classify a class and proceeds with learning. When the language model 110 predicts a spoken sentence such as “I want to know the price of a mobile phone now” using the learned attention mechanism, it assigns a high attention score to a word that is key to predicting a label.

즉, 도 3에 도시된 바와 같이, 언어 모델(110)은 “휴대폰”에 0.5로 높은 어텐션 스코어를 부여하였음을 알 수 있다. That is, as shown in FIG. 3 , it can be seen that the language model 110 assigns a high attention score of 0.5 to “mobile phone”.

이와 같이 발화 문장에서 어텐션 스코어가 각 단어들에 부여되면, 언어 모델(110)은 어텐션 스코가 높은 단어를 기준으로 언어 모델(110)에 포함된 셀프 어텐션 스코어를 확인한 후, 어텐션 스코어를 기준으로 상위 N 개의 연관 단어들을 추출한다. 도 3에는 연관 단어로 3개의 단어들이 추출되는 것을 예로 하여 나타내었다.In this way, when attention scores are assigned to each word in the utterance sentence, the language model 110 checks the self-attention score included in the language model 110 based on the word with the high attention score, and then selects a higher score based on the attention score. Extract N related words. 3 shows an example in which three words are extracted as related words.

즉, 어텐션 메커니즘에서 가장 높은 스코어가 부여된 단어인 "휴대폰"을 키워드로 추출한 언어 모델(110)은, “휴대폰”을 기준으로 셀프 어텐션 스코어에서 상위 3개의 단어들을 추출하여 [지금, 휴대폰, 요금]이라는 단어 집합 즉, 키워드 숙어집을 구성한다. That is, the language model 110 that extracts “mobile phone,” the word with the highest score in the attention mechanism, as a keyword extracts the top three words from the self-attention score based on “mobile phone,” [now, mobile phone, rate ], that is, constitutes a keyword phrasebook.

준 지도 학습이 진행되고 학습 데이터가 많아질수록 언어 모델(110)의 키워드 숙어집을 구성하는데 사용되는 어텐션 스코어와 셀프 어텐션 스코어를 계산하는 성능이 향상되고, 더 질 좋은 숙어집이 구축된다. 이와 같은 절차를 반복하여, 발화 문장에서 추출한 5개의 후보 레이블 각각에 대한 키워드 숙어집을 자동으로 구축한다. As semi-supervised learning progresses and the amount of training data increases, the performance of calculating the attention score and self-attention score used to construct the keyword phrasebook of the language model 110 improves, and a phrasebook of higher quality is built. By repeating this procedure, a keyword phrasebook for each of the five candidate labels extracted from the spoken sentence is automatically built.

즉, 레이블링된 학습 데이터를 가지고 언어 모델(110)을 학습할 때, 각 의도별로 숙어집이 구출된다. 그리고 레이블링되지 않은 데이터를 본 발명의 실시예에 따른 방법으로 레이블링한 후, 레이블링된 데이터를 학습 데이터로 활용하여 언어 모델(110)을 재학습 한다.That is, when learning the language model 110 with labeled training data, phrasebooks are retrieved for each intent. After labeling unlabeled data by the method according to an embodiment of the present invention, the language model 110 is re-learned by using the labeled data as training data.

이와 같이 재학습 절차가 진행되면, 언어 모델(110)의 예측 성능이 향상되기 때문에, 이 과정을 반복하여 진행하면 처음 레이블링되지 않았던 데이터의 레이블링이 정확해지고, 질 좋은 숙어집이 구축된다.If the re-learning process proceeds in this way, the prediction performance of the language model 110 is improved. If this process is repeated, labeling of data that was not initially labeled becomes accurate, and a high-quality phrasebook is built.

다음은 종래의 언어 모델이 레이블링되지 않은 대용량 데이터에 레이블을 설정하는 예와, 본 발명의 실시예에 따라 레이블을 설정하는 예에 대해 도 4와 도 5를 참조로 설명한다.Next, an example of setting a label for large-volume data that is not labeled by a conventional language model and an example of setting a label according to an embodiment of the present invention will be described with reference to FIGS. 4 and 5 .

도 4는 일반적인 분류기의 학습 절차에 대한 예시도이다.4 is an exemplary view of a learning procedure of a general classifier.

도 4에 도시된 바와 같이, 종래에는 분류기(BERT)가 소량의 레이블링된 학습 데이터로 학습되면, 학습된 분류기로 대량의 레이블링 되지 않은 데이터에 대해 레이블링을 지정한다. 그리고, 지정된 레이블링을 학습 데이터에 추가하여 분류기에 대한 재학습을 진행한다.As shown in FIG. 4, conventionally, when the classifier BERT is trained with a small amount of labeled training data, a large amount of unlabeled data is labeled with the learned classifier. Then, re-learning of the classifier is performed by adding the designated labeling to the training data.

이때, 소량의 레이블링된 데이터로 학습된 분류기는 성능이 좋지 않다. 그리고, 만약 데이터의 의도나 데이터 수가 굉장히 많은 경우, 분류기는 top1인 레이블을 제대로 예측하지 못할 수 있다. At this time, the classifier trained with a small amount of labeled data has poor performance. And, if the intent of the data or the number of data is very large, the classifier may not properly predict the top 1 label.

만약, 도 4에 나타낸 바와 같이 의도가 유선/무선의 특정 단어로 구분된다고 가정하면, 종래의 학습된 분류기는 이를 구분하여 예측 의도를 명확하게 정답으로 제공하기 어려운 단점이 있다. If, as shown in FIG. 4, it is assumed that intentions are classified into wired/wireless specific words, the conventional learned classifier has a disadvantage in that it is difficult to clearly provide prediction intentions as correct answers by distinguishing them.

도 5는 본 발명의 실시예에 따른 분류기의 학습 절차에 대한 예시도이다.5 is an exemplary view of a learning procedure of a classifier according to an embodiment of the present invention.

기존에는 딥러닝 모델의 top1 결과만을 사용하여 레이블링 되지 않은 데이터에 대해 레이블을 진행하였으므로, 언어 모델 훈련에 사용한 데이터의 양이 적을 경우, 언어 모델의 성능이 높지 않고, 레이블 되지 않은 데이터에 대해 잘못된 레이블을 지정하는 단점이 존재한다.Previously, only the top1 result of the deep learning model was used to label unlabeled data, so if the amount of data used to train the language model is small, the performance of the language model is not high, and the label is wrong for unlabeled data. There is a downside to specifying .

그러나, 도 5에 도시된 바와 같이 본 발명의 실시예에서는 언어 모델 자체의 셀프 어텐션 스코어와 외부의 어텐션 스코어를 활용하여 키워드 숙어집을 구축한다. 그리고, 언어 모델의 top1 예측값만이 아닌 top2 ~ top5로 예측된 후보 레이블들을 활용하고, 생성한 키워드 숙어집을 활용하여 후보 레이블들에서 정답을 고를 수 있다. 따라서, 본 발명의 실시예에서는 top1으로 예측된 후보 레이블이 문장 의도에 적합하지 않을 경우, 후보 레이블들에서 정답을 정확하게 고를 수 있다.However, as shown in FIG. 5 , in the embodiment of the present invention, a keyword phrasebook is built using the self-attention score of the language model itself and the attention score of an external language model. In addition, it is possible to select the correct answer from the candidate labels by using the candidate labels predicted as top2 to top5, not only the top1 prediction value of the language model, and using the generated keyword phrasebook. Therefore, in the embodiment of the present invention, when the candidate label predicted by top1 is not suitable for the sentence intent, the correct answer can be accurately selected from the candidate labels.

도 6은 본 발명의 실시예에 따른 컴퓨터 시스템의 구조도이다.6 is a structural diagram of a computer system according to an embodiment of the present invention.

도 6을 참고하면, 적어도 하나의 프로세서에 의해 동작하는 컴퓨터 시스템(200)에서, 본 발명의 동작을 실행하도록 기술된 명령들(instructions)이 포함된 프로그램을 실행한다. 프로그램은 컴퓨터 판독 가능한 저장매체에 저장될 수 있고, 유통될 수 있다. 여기서, 컴퓨터 시스템(200)의 구조는 본 발명의 실시예에 따른 자동 레이블링 장치(100)의 구조일 수도 있다.Referring to FIG. 6 , a computer system 200 operated by at least one processor executes a program including instructions described for executing the operation of the present invention. The program may be stored in a computer readable storage medium and may be distributed. Here, the structure of the computer system 200 may be the structure of the automatic labeling device 100 according to an embodiment of the present invention.

컴퓨터 시스템(200)의 하드웨어는 적어도 하나의 프로세서(210), 메모리(220), 스토리지(230), 통신 인터페이스(240)를 포함할 수 있고, 버스를 통해 연결될 수 있다. 이외에도 입력 장치 및 출력 장치 등의 하드웨어가 포함될 수 있다. 컴퓨터 시스템(200)는 프로그램을 구동할 수 있는 운영 체제를 비롯한 각종 소프트웨어가 탑재될 수 있다.The hardware of the computer system 200 may include at least one processor 210, memory 220, storage 230, and communication interface 240, and may be connected through a bus. In addition, hardware such as an input device and an output device may be included. The computer system 200 may be loaded with various software including an operating system capable of driving programs.

프로세서(210)는 컴퓨터 시스템(200)의 동작을 제어하는 장치로서, 프로그램에 포함된 명령들을 처리하는 다양한 형태의 프로세서일 수 있고, 예를 들면, CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 등 일 수 있다. The processor 210 is a device for controlling the operation of the computer system 200, and may be various types of processors that process commands included in a program, for example, a Central Processing Unit (CPU) or a Micro Processor Unit (MPU). ), MCU (Micro Controller Unit), GPU (Graphic Processing Unit), and the like.

메모리(220)는 본 발명의 동작을 실행하도록 기술된 명령들이 프로세서(210)에 의해 처리되도록 해당 프로그램을 로드한다. 메모리(220)는 예를 들면, ROM(read only memory), RAM(random access memory) 등 일 수 있다. 스토리지(230)는 본 발명의 동작을 실행하는데 요구되는 각종 데이터, 프로그램 등을 저장한다. 통신 인터페이스(240)는 유/무선 통신 모듈일 수 있다. Memory 220 loads a corresponding program so that the instructions described to carry out the operations of the present invention are processed by processor 210 . The memory 220 may be, for example, read only memory (ROM) or random access memory (RAM). The storage 230 stores various data, programs, etc. required to execute the operation of the present invention. The communication interface 240 may be a wired/wireless communication module.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements made by those skilled in the art using the basic concept of the present invention defined in the following claims are also included in the scope of the present invention. that fall within the scope of the right.

Claims

As a method for an automatic labeling device to label a spoken sentence,
Learning a classifier to receive learning data labeled with speech intention and output the speech intention from a learning sentence included in each of the training data;
generating a keyword phrasebook for the learned sentence by inputting the learning data into the learned classifier; and
Receiving a speech sentence whose speech intention is not labeled, and outputting the speech intention of the speech sentence using a plurality of candidate labels for the speech sentence using the learned classifier and a keyword phrasebook generated from the speech sentence step
including,
The classifier,
A language model learned to extract speech intent and keywords from the learning data and the speech sentence; and
An attention mechanism that receives the keyword extracted by the language model, extracts a plurality of words related to the keyword from the learning data and the spoken sentence, and generates a keyword phrasebook together with the keyword.
Including, labeling method.

According to claim 1,
In the step of outputting the utterance intention,
comparing a first candidate label having the highest prediction reliability among a plurality of candidate labels generated from the utterance sentence with a preset threshold; and
If the prediction reliability of the first candidate label is greater than the threshold value, outputting the first candidate label as the correct answer of the spoken sentence and setting it as the correct answer label for the spoken sentence.
Including, labeling method.

According to claim 2,
If the prediction reliability of the first candidate label is less than the threshold value, any one of the plurality of candidate labels excluding the first candidate label is set as the correct answer label based on the keyword phrasebook generated from the spoken sentence. steps to set up
Further comprising a labeling method.

According to claim 3,
In the step of outputting the utterance intention,
extracting, by a self-attention mechanism included in the language model, a word for which a self-attention score is highest in the spoken sentence as a keyword of the spoken sentence;
extracting, by the attention mechanism, at least one related word related to the keyword extracted by the self-attention mechanism; and
generating the keyword phrasebook including the keyword and the at least one related word;
Including, labeling method.

According to claim 4,
After the step of outputting the utterance intention,
Re-learning the classifier using the utterance sentence in which the correct answer label is set
Including, labeling method.

An automatic labeling device for labeling an utterance intent in a utterance sentence,
A memory containing at least one instruction and storing labeled training data;
an interface that receives an unlabeled spoken sentence uttered by a speaker; and
processor
including,
the processor,
A learning sentence and an utterance intent for the learning sentence are extracted from the learning data, a classifier is trained using the learning sentence and an utterance intent, a keyword phrasebook for the learning sentence is created in the learned classifier, An automatic labeling device for inputting the spoken sentence into a classifier and labeling the spoken sentence with an intention for the spoken sentence.

According to claim 6,
the processor,
The prediction reliability of speech intent is calculated from the speech sentence, a plurality of candidate labels are extracted, and a first candidate label having the highest prediction reliability among the extracted candidate labels is compared with a preset threshold, so that the first candidate label is obtained. An automatic labeling device that determines whether a correct label is correct.

According to claim 7,
the processor,
If the prediction reliability of the first candidate label is greater than the threshold value, the first candidate label is output as a correct answer of the spoken sentence and set as a correct answer label for the spoken sentence;
If the prediction reliability of the first candidate label is less than the threshold value, any one of the plurality of candidate labels excluding the first candidate label is set as the correct answer label based on the keyword phrasebook generated from the spoken sentence. to set up, an automatic labeling device.

According to claim 8,
the processor,
A word having the highest self-attention score calculated in the spoken sentence is extracted as a keyword of the spoken sentence, at least one related word related to the keyword is extracted, and the keyword including the keyword and the at least one related word is extracted. An automatic labeling device that generates phrasebooks.