KR0123238B1

KR0123238B1 - Morphemes analysis system

Info

Publication number: KR0123238B1
Application number: KR1019940031854A
Authority: KR
Inventors: 김영환; 정일형
Original assignee: 조백제; 한국전기통신공사
Priority date: 1994-11-29
Filing date: 1994-11-29
Publication date: 1997-11-21
Also published as: KR960018972A

Abstract

본 발명은 한국어 정보처리 및 정보검색 시스팀과 각종 한국어, 자연어간의 인터페이스의 기반이 되는 어절구조 특성을 이용한 형태소분석시스팀 및 분석방법에 관한 것으로써, 형태소분석시스팀은, 사용자 입/출력단말기(11)와, 중앙처리장치(13)와, 전자사전 저장부(14)와, 접속정보표 저장부(15)를 구비하는 것을 특징으로 하며, 분석방법은, 연속된 한글 스트링이 입력되면 어절단위로 분리하는 제1단계와, 상기 분리된 어절의 오른쪽에서 왼쪽으로 자소를 분리하여 어미를 확인하는 제2단계와, 상기 확인결과, 어미가 있으면 어미의 원형을 복원하여 용언어간을 복수음절 단위로 처리하고 저장하는 제3단계와, 상기 확인결과, 어미가 없으면 상기 어절에 대해 음절단위로 처리하여 상기 제3단계의 저장된 내용과 함께 형태소 분석결과를 출력하는 제4단계를 포함하는 것을 특징으로 한다.The present invention relates to a morphological analysis system and analysis method using a word structure characteristic that is the basis of an interface between Korean information processing and information retrieval system and various Korean and natural language. The morphological analysis system includes a user input / output terminal (11). And a central processing unit (13), an electronic dictionary storage unit (14), and a connection information table storage unit (15). And a second step of checking the ending by separating the phonemes from the right side of the separated word to the left side, and restoring the original form of the ending if the ending is found, and treating the verbs in plural syllable units. A third step of storing, and a fourth step of outputting a morphological analysis result with the stored contents of the third step by processing the syllable unit for the word if there is no ending. In that it comprises the features.

Description

Morphological Analysis System and Analysis Method Using Characteristics of Word Structure

제1도는 본 발명에 따른 어절구조 특성을 이용한 형태소 분석 시스팀의 구성 블럭도.1 is a block diagram of a morphological analysis system using the word structure characteristics according to the present invention.

제2도는 제1도의 중앙처리장치에 대한 상세 구성 블럭도.2 is a detailed block diagram of the central processing unit of FIG.

제3도는 본 발명에 따른 형태소 분석 처리 흐름도.3 is a morphological analysis processing flow chart according to the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

11 : 사용자 입/출력 단말기 12 : 보조기억장치11: user input / output terminal 12: auxiliary memory device

13 : 중앙처리장치 14 : 전자사전 저장부13 central processing unit 14 electronic dictionary storage unit

15 : 접속정보표저장부 21 : 입력부15: connection information table storage unit 21: input unit

22 : 형태소 분리부 23 : 출력부22: morphological separation unit 23: output unit

24 : 원형복원부 25 : 접속정보 검사부24: circular restoration 25: connection information inspection unit

본 발명은 형태소 분석 시스팀 및 분석방법에 관한 것으로, 특히 한국어 정보처리 및 정보검색 시스팀과 각종 한국어, 자연어 인터페이스의 기반이 되는 어절구조 특성을 이용한 형태소 분석 시스팀 및 분석방법에 관한 것이다.The present invention relates to a morphological analysis system and an analysis method, and more particularly, to a morphological analysis system and an analysis method using a word structure characteristic that is the basis of various Korean and natural language interfaces.

종래의 형태소 분석 방법은 분석대상인 하나의 어절을 분석할 때에 각 어절의 특성을 무시하고 모든 어절을 자소 단위로 처리하기 때문에 분석시간이 많이 걸린다는 문제점이 있었다.The conventional morpheme analysis method has a problem that it takes a lot of analysis time because it ignores the characteristics of each word and analyzes all the words in phoneme units when analyzing one word.

또한 최근에는 음절단위의 분석이 시도되고 있지만 한국어의 자모체계의 형성원리를 반영하지 않으므로써, 그 부담이 상당히 커서 실효성을 거두지 못하고 있다. 즉 한국어의 문장은 어절들로 구성되고 하나의 어절은 음절들로, 그리고 음절들은 자소로 구성되는 특성이 있는데, 음절단위의 분석방법은 음절을 이루는 자소의 결합 특성을 무시하여 수집된 음절들의 패턴에만 의존하며, 자소단위의 분석방법은 하나의 음절 또는 몇 음절의 묶음이 가지는 특성을 무시하여 더 이상의 분해가 필요없음에도 이를 자소로 분리하여 분석하므로서 형태소 분석의 효율성을 저하시키는 문제점이 있었다.In recent years, the syllable unit analysis has been attempted, but since it does not reflect the principle of forming the Korean alphabet system, the burden is so great that it is not effective. In other words, Korean sentences are composed of words, one word is composed of syllables, and syllables are composed of phonemes. The syllable unit analysis method is a pattern of syllables collected by ignoring the combination of phonemes that make up syllables. Relying only on the phoneme, the analysis method of the phoneme unit has a problem of degrading the efficiency of morphological analysis by separating it into phonemes even though no further decomposition is needed, ignoring the characteristics of one syllable or several syllable bundles.

본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로, 한국어의 어절구조 특성을 이용하여 자소단위의 처리가 필요한 어미의 처리에만 자소단위로 분석하고 몇개의 음절 묶음을 한번에 처리해도 좋을 용언어간은 음절 묶음 단위로 분석하며 나머지는 모두 음절단위로 분석하므로서 형태소 분석의 시간을 최소화시키는 어절구조 특성을 이용한 형태소분석 시스팀 및 분석방법을 제공함에 그 목적이 있다.The present invention has been made to solve the above problems of the prior art, it is possible to analyze only in the phoneme unit and to process several syllable bundles at once using the word structure structure of Korean. The purpose of this study is to provide a morphological analysis system and analysis method using the structure of word structure that minimizes the time of morphological analysis by analyzing the syllables in syllable bundle units and all the rest in syllable units.

상기 목적을 달성하기 위한 본 발명은, 보조기억장치와 더불어 형태소 분석의 대상인 한국어 입력 스트림을 입력받고 형태소 분석가능한 결과만을 출력하는 사용자 입/출력 단말기와, 한국어의 어절 구조 특성을 이용하여 형태소 분석을 하는 중앙처리장치와, 상기 중앙처리장치의 형태소 분석시에 어절에서 분리된 부분이 정상적인 형태소인가를 검사하기 위한 전자사전 저장부와, 상기 중앙처리장치가 분리된 어절의 부분들이 결합 가능한가를 검사하기 위한 접속정보표 저장부를 구비하는 것을 특징으로 한다.In order to achieve the above object, the present invention provides a user input / output terminal for inputting a Korean input stream which is a subject of morphological analysis together with an auxiliary memory device, and outputting only results that can be morphologically analyzed. A central processing unit, an electronic dictionary storage unit for checking whether a section separated from a word is a normal morpheme during morphological analysis of the central processing unit, and checking whether the central processing unit can be combined with each other. It characterized in that it comprises a connection information table storage for.

이하 첨부도면을 참조하여 본 발명의 일실시예를 상세히 설명한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

제1도는 본 발명에 따른 어절구조 특성을 이용한 형태소 분석시스팀의 구성 블럭도로써, 보조기억장치(12)와 더불어 형태소 분석의 대상인 한국어 입력 스트림을 입력받고 형태소 분석가능한 결과만을 출력하는 사용자 입/출력 단말기(11)와, 한국어의 어절구조 특성을 이용하여 형태소 분석을 하는 중앙처리장치(13)와, 상기 중앙처리장치(13)의 형태소 분석시에 어절에서 분리된 부분이 사전에 등록된 형태소인가를 검사하고 그 형태소와 관련된 정보를 얻기 위한 전자사전 저장부(14)와, 상기 중앙처리장치(13)가 분리된 어절의 부분들이 결합 가능한가를 검사하기 위한 접속정보표 저장부(15)를 구비한다.FIG. 1 is a block diagram of a morphological analysis system using the word structure characteristics according to the present invention. In addition to the auxiliary memory device 12, a user input / output for receiving a Korean input stream which is a morphological analysis target and outputting only morphological analysis results is provided. Is the terminal 11, the central processing unit 13 for morphological analysis using the word structure characteristic of Korean, and the parts separated from the word in the morphological analysis of the central processing unit 13 are registered morphemes in advance And an electronic dictionary storage unit 14 for inspecting and obtaining information related to the morpheme, and a connection information table storage unit 15 for inspecting whether the central word processing unit 13 can be combined. do.

제2도는 상기 제1도의 중앙처리장치에 대한 상세 구성 블럭도이다. 도면에 도시된 바와 같이 중앙처리장치(13)는, 상기 사용자 입/출력 단말기(11)로부터 한국어 입력 스트림을 어절단위로 분절하는 입력부(21)와, 상기 분절된 어절의 특성을 이용하여 형태소를 분리하는 형태소 분리부(22)와, 상기 분절된 어절이 용언인 경우 용언의 변형을 원형으로 복원하는 원형복원부(24)와, 상기 분리된 형태소들 사이의 결합 적합성 여부를 상기 접속정보표(15)를 이용하여 판단하는 접속정보 검사부(25)와, 상기 분리된 형태소들이 연결가능한 형태로 형태소 분석결과를 출력하는 출력부(23)를 구비한다.2 is a detailed block diagram of the central processing unit of FIG. As shown in the drawing, the central processing unit 13 includes an input unit 21 for segmenting a Korean input stream by word units from the user input / output terminal 11 and a morpheme using the characteristics of the segmented words. The morpheme separation unit 22 to separate, the circular restoring unit 24 for restoring the deformation of the word in a circular form when the segmented word is a verb, and the connection suitability between the separated morphemes are connected to the connection information table ( 15 is provided with a connection information inspection unit 25 to determine using, and the output unit 23 for outputting the result of the morpheme analysis in a form that can be connected to the separated morphemes.

상기와 같이 구성되는 중앙처리장치(13)의 동작을 살펴보면, 입력부(21)는 한국어의 어절과 어절 사이는 띄어쓴다는 특성을 이용하여 스페이스(빈칸)를 어절 분리의 구분자로 사용하고, 상기 형태소 분리부(22)는 상기 분절된 하나의 어절이 반드시 자소로 분리되어야 하는 경우는 용언의 어미부분이고, 용언은 어간과 어미로 나뉘어져 그 원형이 복원된 후의 어간은 더이상 분리될 필요가 없으며, 나머지 경우는 모두 음절단위로 처리가능한 특성을 이용하여 형태소로 분리한다. 그리고 변형된 용언은 원형복원부(24)에서 복원하고 상기와 같이 분리된 형태소들 사이의 결합 적합성은 접속정보 검사부(25)에서 판단하여 출력한다.Referring to the operation of the central processing unit 13 configured as described above, the input unit 21 uses a space (blank) as a delimiter for word separation by using the spacing between the word and the word in Korean, the morpheme separation The part 22 is the end of the verb when the segmented word must be separated into phonemes, the verb is divided into a stem and a mother, and the stem after the original form is restored no longer need to be separated. All are separated into morphemes using properties that can be processed in syllable units. And the modified word is restored in the circular restorer 24 and the coupling suitability between the separated morphemes as described above is determined and output from the connection information checker (25).

제3도는 본 발명에 따른 형태소 분석 처리 흐름도로써, 연속된 한글 스트링이 입력되면(301), 어절과 어절 사이의 빈칸을 구분자로 하여 어절단위로 분리하고(302), 각 어절에 대해 용언의 어미를 인식하기 위하여 어절의 오른쪽에서 왼쪽으로 자소를 분리한다(303). 그리고 분리된 자소중에 어미가 있는지 확인하여(304), 있으면 어미의 원형을 복원하여 용언어간을 복수음절 단위로 처리하고 그 결과를 저장하며(305), 없으면 상기 자소로 분리되기 전의 어절에 대해 음절단위로 처리하여(306), 상기 저장된 내용과 함께 형태소 분석결과를 출력한다(307).3 is a flowchart of a morpheme analysis processing according to the present invention. When a continuous Hangul string is input (301), a space between a word and a word is separated by word units (302), and a verb of a word is used for each word. The phoneme is separated from the right side to the left side of the word to recognize the symbol (303). If there is a mother word among the separated phonemes (304), if it is found, the original form of the mother is restored to process the verbs between plural syllable units, and the result is stored (305). If not, the syllables for the words before being separated into the phonemes In step 306, the morphological analysis result is output together with the stored contents (307).

상기한 바와 같이 본 발명에 의하면, 어절구조 특성을 이용하여 분석하므로서 그 분석시간을 최소화하고 효율성을 극대화한다. 특히 형태소분석을 필요로 하는 응용시스팀에서 형태소 분석시간의 부담이 최소화되어 전체적인 성능향상의 효과가 있다.As described above, according to the present invention, the analysis time is minimized and the efficiency is maximized by analyzing the characteristics of the word structure. In particular, in the application system requiring morphological analysis, the burden of morphological analysis time is minimized, thereby improving the overall performance.

Claims

In addition to the auxiliary memory device 12, the user input / output terminal 11 for receiving a Korean input stream that is the subject of morphological analysis and outputting only results that can be morphologically analyzed, and a central processing unit for morphological analysis using the Korean word structure characteristics (13), the electronic dictionary storage unit 14 for checking whether the portion separated from the word in the morphological analysis of the central processing unit 13 and the normal morpheme, and the word that the central processing unit 13 is separated A morphological analysis system using word structure characteristics, characterized in that it comprises a connection information table storage unit 15 for checking whether the parts of the can be combined.

According to claim 1, wherein the central processing unit (13) uses the input means 21 for segmenting the Korean input stream input from the user input / output terminal 11 by word units, and using the characteristics of the segmented words The morpheme separation means 22 for separating the morphemes, the circular restoring means 24 for restoring the deformation of the verb in a circular form when the segmented word is a verb, and the connection suitability between the separated morphemes Word information structure characterized in that it comprises a connection information inspection means 25 for determining by using the table storage unit 15 and an output means 23 for outputting the result of the morpheme analysis in a form that can be connected to the separated morphemes Morphological analysis system.

In the morpheme analysis method applied to the morpheme analysis system using a word structure characteristic, the first step of separating the words between the word and the word into word units when a continuous Hangul string is input, and the right side of the separated word A second step of checking whether there is a mother by separating the phonemes from the left side, and if the mother is present, restoring the prototype of the mother to process the verbal language in plural syllable units and storing the result; As a result of the check, if there is no ending, the fourth step of outputting the result of morphological analysis together with the stored contents of the third step is performed by processing the words before being separated into the phoneme in syllable units. Morphological Analysis.