KR0123238B1 - Morphemes analysis system - Google Patents
Morphemes analysis systemInfo
- Publication number
- KR0123238B1 KR0123238B1 KR1019940031854A KR19940031854A KR0123238B1 KR 0123238 B1 KR0123238 B1 KR 0123238B1 KR 1019940031854 A KR1019940031854 A KR 1019940031854A KR 19940031854 A KR19940031854 A KR 19940031854A KR 0123238 B1 KR0123238 B1 KR 0123238B1
- Authority
- KR
- South Korea
- Prior art keywords
- word
- morphological analysis
- separated
- central processing
- morpheme
- Prior art date
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 43
- 230000000877 morphologic effect Effects 0.000 claims abstract description 28
- 238000000926 separation method Methods 0.000 claims description 5
- 238000007689 inspection Methods 0.000 claims description 3
- 238000000034 method Methods 0.000 claims description 3
- 230000001755 vocal effect Effects 0.000 claims 1
- 230000010365 information processing Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
Landscapes
- Machine Translation (AREA)
Abstract
본 발명은 한국어 정보처리 및 정보검색 시스팀과 각종 한국어, 자연어간의 인터페이스의 기반이 되는 어절구조 특성을 이용한 형태소분석시스팀 및 분석방법에 관한 것으로써, 형태소분석시스팀은, 사용자 입/출력단말기(11)와, 중앙처리장치(13)와, 전자사전 저장부(14)와, 접속정보표 저장부(15)를 구비하는 것을 특징으로 하며, 분석방법은, 연속된 한글 스트링이 입력되면 어절단위로 분리하는 제1단계와, 상기 분리된 어절의 오른쪽에서 왼쪽으로 자소를 분리하여 어미를 확인하는 제2단계와, 상기 확인결과, 어미가 있으면 어미의 원형을 복원하여 용언어간을 복수음절 단위로 처리하고 저장하는 제3단계와, 상기 확인결과, 어미가 없으면 상기 어절에 대해 음절단위로 처리하여 상기 제3단계의 저장된 내용과 함께 형태소 분석결과를 출력하는 제4단계를 포함하는 것을 특징으로 한다.The present invention relates to a morphological analysis system and analysis method using a word structure characteristic that is the basis of an interface between Korean information processing and information retrieval system and various Korean and natural language. The morphological analysis system includes a user input / output terminal (11). And a central processing unit (13), an electronic dictionary storage unit (14), and a connection information table storage unit (15). And a second step of checking the ending by separating the phonemes from the right side of the separated word to the left side, and restoring the original form of the ending if the ending is found, and treating the verbs in plural syllable units. A third step of storing, and a fourth step of outputting a morphological analysis result with the stored contents of the third step by processing the syllable unit for the word if there is no ending. In that it comprises the features.
Description
제1도는 본 발명에 따른 어절구조 특성을 이용한 형태소 분석 시스팀의 구성 블럭도.1 is a block diagram of a morphological analysis system using the word structure characteristics according to the present invention.
제2도는 제1도의 중앙처리장치에 대한 상세 구성 블럭도.2 is a detailed block diagram of the central processing unit of FIG.
제3도는 본 발명에 따른 형태소 분석 처리 흐름도.3 is a morphological analysis processing flow chart according to the present invention.
* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings
11 : 사용자 입/출력 단말기 12 : 보조기억장치11: user input / output terminal 12: auxiliary memory device
13 : 중앙처리장치 14 : 전자사전 저장부13 central processing unit 14 electronic dictionary storage unit
15 : 접속정보표저장부 21 : 입력부15: connection information table storage unit 21: input unit
22 : 형태소 분리부 23 : 출력부22: morphological separation unit 23: output unit
24 : 원형복원부 25 : 접속정보 검사부24: circular restoration 25: connection information inspection unit
본 발명은 형태소 분석 시스팀 및 분석방법에 관한 것으로, 특히 한국어 정보처리 및 정보검색 시스팀과 각종 한국어, 자연어 인터페이스의 기반이 되는 어절구조 특성을 이용한 형태소 분석 시스팀 및 분석방법에 관한 것이다.The present invention relates to a morphological analysis system and an analysis method, and more particularly, to a morphological analysis system and an analysis method using a word structure characteristic that is the basis of various Korean and natural language interfaces.
종래의 형태소 분석 방법은 분석대상인 하나의 어절을 분석할 때에 각 어절의 특성을 무시하고 모든 어절을 자소 단위로 처리하기 때문에 분석시간이 많이 걸린다는 문제점이 있었다.The conventional morpheme analysis method has a problem that it takes a lot of analysis time because it ignores the characteristics of each word and analyzes all the words in phoneme units when analyzing one word.
또한 최근에는 음절단위의 분석이 시도되고 있지만 한국어의 자모체계의 형성원리를 반영하지 않으므로써, 그 부담이 상당히 커서 실효성을 거두지 못하고 있다. 즉 한국어의 문장은 어절들로 구성되고 하나의 어절은 음절들로, 그리고 음절들은 자소로 구성되는 특성이 있는데, 음절단위의 분석방법은 음절을 이루는 자소의 결합 특성을 무시하여 수집된 음절들의 패턴에만 의존하며, 자소단위의 분석방법은 하나의 음절 또는 몇 음절의 묶음이 가지는 특성을 무시하여 더 이상의 분해가 필요없음에도 이를 자소로 분리하여 분석하므로서 형태소 분석의 효율성을 저하시키는 문제점이 있었다.In recent years, the syllable unit analysis has been attempted, but since it does not reflect the principle of forming the Korean alphabet system, the burden is so great that it is not effective. In other words, Korean sentences are composed of words, one word is composed of syllables, and syllables are composed of phonemes. The syllable unit analysis method is a pattern of syllables collected by ignoring the combination of phonemes that make up syllables. Relying only on the phoneme, the analysis method of the phoneme unit has a problem of degrading the efficiency of morphological analysis by separating it into phonemes even though no further decomposition is needed, ignoring the characteristics of one syllable or several syllable bundles.
본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로, 한국어의 어절구조 특성을 이용하여 자소단위의 처리가 필요한 어미의 처리에만 자소단위로 분석하고 몇개의 음절 묶음을 한번에 처리해도 좋을 용언어간은 음절 묶음 단위로 분석하며 나머지는 모두 음절단위로 분석하므로서 형태소 분석의 시간을 최소화시키는 어절구조 특성을 이용한 형태소분석 시스팀 및 분석방법을 제공함에 그 목적이 있다.The present invention has been made to solve the above problems of the prior art, it is possible to analyze only in the phoneme unit and to process several syllable bundles at once using the word structure structure of Korean. The purpose of this study is to provide a morphological analysis system and analysis method using the structure of word structure that minimizes the time of morphological analysis by analyzing the syllables in syllable bundle units and all the rest in syllable units.
상기 목적을 달성하기 위한 본 발명은, 보조기억장치와 더불어 형태소 분석의 대상인 한국어 입력 스트림을 입력받고 형태소 분석가능한 결과만을 출력하는 사용자 입/출력 단말기와, 한국어의 어절 구조 특성을 이용하여 형태소 분석을 하는 중앙처리장치와, 상기 중앙처리장치의 형태소 분석시에 어절에서 분리된 부분이 정상적인 형태소인가를 검사하기 위한 전자사전 저장부와, 상기 중앙처리장치가 분리된 어절의 부분들이 결합 가능한가를 검사하기 위한 접속정보표 저장부를 구비하는 것을 특징으로 한다.In order to achieve the above object, the present invention provides a user input / output terminal for inputting a Korean input stream which is a subject of morphological analysis together with an auxiliary memory device, and outputting only results that can be morphologically analyzed. A central processing unit, an electronic dictionary storage unit for checking whether a section separated from a word is a normal morpheme during morphological analysis of the central processing unit, and checking whether the central processing unit can be combined with each other. It characterized in that it comprises a connection information table storage for.
이하 첨부도면을 참조하여 본 발명의 일실시예를 상세히 설명한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.
제1도는 본 발명에 따른 어절구조 특성을 이용한 형태소 분석시스팀의 구성 블럭도로써, 보조기억장치(12)와 더불어 형태소 분석의 대상인 한국어 입력 스트림을 입력받고 형태소 분석가능한 결과만을 출력하는 사용자 입/출력 단말기(11)와, 한국어의 어절구조 특성을 이용하여 형태소 분석을 하는 중앙처리장치(13)와, 상기 중앙처리장치(13)의 형태소 분석시에 어절에서 분리된 부분이 사전에 등록된 형태소인가를 검사하고 그 형태소와 관련된 정보를 얻기 위한 전자사전 저장부(14)와, 상기 중앙처리장치(13)가 분리된 어절의 부분들이 결합 가능한가를 검사하기 위한 접속정보표 저장부(15)를 구비한다.FIG. 1 is a block diagram of a morphological analysis system using the word structure characteristics according to the present invention. In addition to the auxiliary memory device 12, a user input / output for receiving a Korean input stream which is a morphological analysis target and outputting only morphological analysis results is provided. Is the terminal 11, the central processing unit 13 for morphological analysis using the word structure characteristic of Korean, and the parts separated from the word in the morphological analysis of the central processing unit 13 are registered morphemes in advance And an electronic dictionary storage unit 14 for inspecting and obtaining information related to the morpheme, and a connection information table storage unit 15 for inspecting whether the central word processing unit 13 can be combined. do.
제2도는 상기 제1도의 중앙처리장치에 대한 상세 구성 블럭도이다. 도면에 도시된 바와 같이 중앙처리장치(13)는, 상기 사용자 입/출력 단말기(11)로부터 한국어 입력 스트림을 어절단위로 분절하는 입력부(21)와, 상기 분절된 어절의 특성을 이용하여 형태소를 분리하는 형태소 분리부(22)와, 상기 분절된 어절이 용언인 경우 용언의 변형을 원형으로 복원하는 원형복원부(24)와, 상기 분리된 형태소들 사이의 결합 적합성 여부를 상기 접속정보표(15)를 이용하여 판단하는 접속정보 검사부(25)와, 상기 분리된 형태소들이 연결가능한 형태로 형태소 분석결과를 출력하는 출력부(23)를 구비한다.2 is a detailed block diagram of the central processing unit of FIG. As shown in the drawing, the central processing unit 13 includes an input unit 21 for segmenting a Korean input stream by word units from the user input / output terminal 11 and a morpheme using the characteristics of the segmented words. The morpheme separation unit 22 to separate, the circular restoring unit 24 for restoring the deformation of the word in a circular form when the segmented word is a verb, and the connection suitability between the separated morphemes are connected to the connection information table ( 15 is provided with a connection information inspection unit 25 to determine using, and the output unit 23 for outputting the result of the morpheme analysis in a form that can be connected to the separated morphemes.
상기와 같이 구성되는 중앙처리장치(13)의 동작을 살펴보면, 입력부(21)는 한국어의 어절과 어절 사이는 띄어쓴다는 특성을 이용하여 스페이스(빈칸)를 어절 분리의 구분자로 사용하고, 상기 형태소 분리부(22)는 상기 분절된 하나의 어절이 반드시 자소로 분리되어야 하는 경우는 용언의 어미부분이고, 용언은 어간과 어미로 나뉘어져 그 원형이 복원된 후의 어간은 더이상 분리될 필요가 없으며, 나머지 경우는 모두 음절단위로 처리가능한 특성을 이용하여 형태소로 분리한다. 그리고 변형된 용언은 원형복원부(24)에서 복원하고 상기와 같이 분리된 형태소들 사이의 결합 적합성은 접속정보 검사부(25)에서 판단하여 출력한다.Referring to the operation of the central processing unit 13 configured as described above, the input unit 21 uses a space (blank) as a delimiter for word separation by using the spacing between the word and the word in Korean, the morpheme separation The part 22 is the end of the verb when the segmented word must be separated into phonemes, the verb is divided into a stem and a mother, and the stem after the original form is restored no longer need to be separated. All are separated into morphemes using properties that can be processed in syllable units. And the modified word is restored in the circular restorer 24 and the coupling suitability between the separated morphemes as described above is determined and output from the connection information checker (25).
제3도는 본 발명에 따른 형태소 분석 처리 흐름도로써, 연속된 한글 스트링이 입력되면(301), 어절과 어절 사이의 빈칸을 구분자로 하여 어절단위로 분리하고(302), 각 어절에 대해 용언의 어미를 인식하기 위하여 어절의 오른쪽에서 왼쪽으로 자소를 분리한다(303). 그리고 분리된 자소중에 어미가 있는지 확인하여(304), 있으면 어미의 원형을 복원하여 용언어간을 복수음절 단위로 처리하고 그 결과를 저장하며(305), 없으면 상기 자소로 분리되기 전의 어절에 대해 음절단위로 처리하여(306), 상기 저장된 내용과 함께 형태소 분석결과를 출력한다(307).3 is a flowchart of a morpheme analysis processing according to the present invention. When a continuous Hangul string is input (301), a space between a word and a word is separated by word units (302), and a verb of a word is used for each word. The phoneme is separated from the right side to the left side of the word to recognize the symbol (303). If there is a mother word among the separated phonemes (304), if it is found, the original form of the mother is restored to process the verbs between plural syllable units, and the result is stored (305). If not, the syllables for the words before being separated into the phonemes In step 306, the morphological analysis result is output together with the stored contents (307).
상기한 바와 같이 본 발명에 의하면, 어절구조 특성을 이용하여 분석하므로서 그 분석시간을 최소화하고 효율성을 극대화한다. 특히 형태소분석을 필요로 하는 응용시스팀에서 형태소 분석시간의 부담이 최소화되어 전체적인 성능향상의 효과가 있다.As described above, according to the present invention, the analysis time is minimized and the efficiency is maximized by analyzing the characteristics of the word structure. In particular, in the application system requiring morphological analysis, the burden of morphological analysis time is minimized, thereby improving the overall performance.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1019940031854A KR0123238B1 (en) | 1994-11-29 | 1994-11-29 | Morphemes analysis system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1019940031854A KR0123238B1 (en) | 1994-11-29 | 1994-11-29 | Morphemes analysis system |
Publications (2)
Publication Number | Publication Date |
---|---|
KR960018972A KR960018972A (en) | 1996-06-17 |
KR0123238B1 true KR0123238B1 (en) | 1997-11-21 |
Family
ID=19399553
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1019940031854A KR0123238B1 (en) | 1994-11-29 | 1994-11-29 | Morphemes analysis system |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR0123238B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100371135B1 (en) * | 1999-09-10 | 2003-02-05 | 한국전자통신연구원 | Declinable-word morphology analyzing apparatus using a declinable-word derivative-dictionary and method therefor |
KR100371134B1 (en) * | 1999-05-11 | 2003-02-05 | 한국전자통신연구원 | Method to recover root form of inflected verb based-on compound ending dictionary |
KR101117790B1 (en) * | 2009-10-29 | 2012-02-29 | 송도규 | System and Method for Morpheme analysis Using Combination Information of a Part of Speech |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR19980066877A (en) * | 1997-01-29 | 1998-10-15 | 김광호 | Morphological interpretation based on types of unregistered words |
KR20020054254A (en) * | 2000-12-27 | 2002-07-06 | 오길록 | Analysis Method for Korean Morphology using AVL+Trie Structure |
-
1994
- 1994-11-29 KR KR1019940031854A patent/KR0123238B1/en not_active IP Right Cessation
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100371134B1 (en) * | 1999-05-11 | 2003-02-05 | 한국전자통신연구원 | Method to recover root form of inflected verb based-on compound ending dictionary |
KR100371135B1 (en) * | 1999-09-10 | 2003-02-05 | 한국전자통신연구원 | Declinable-word morphology analyzing apparatus using a declinable-word derivative-dictionary and method therefor |
KR101117790B1 (en) * | 2009-10-29 | 2012-02-29 | 송도규 | System and Method for Morpheme analysis Using Combination Information of a Part of Speech |
Also Published As
Publication number | Publication date |
---|---|
KR960018972A (en) | 1996-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3971373B2 (en) | Hybrid automatic translation system that mixes rule-based method and translation pattern method | |
EP0612018B1 (en) | Apparatus and method for syntactic signal analysis | |
JP2007265458A (en) | Method and computer for generating a plurality of compression options | |
EP1078322B1 (en) | System for creating a dictionary | |
EP0403057B1 (en) | Method of translating sentence including adverb phrase by using translating apparatus | |
EP0398513B1 (en) | Method and apparatus for translating a sentence including a compound word formed by hyphenation | |
KR0123238B1 (en) | Morphemes analysis system | |
US20050027509A1 (en) | Left-corner chart parsing | |
US6829580B1 (en) | Linguistic converter | |
US6219449B1 (en) | Character recognition system | |
KR20010075848A (en) | Apparatus and method for detecting sentence boundary using regular expression and probabilistic contextual information | |
JP2536633B2 (en) | Compound word extraction device | |
KR100617317B1 (en) | Method for re-analysis of compound noun to decide lexical entries and apparatus thereof | |
KR940022311A (en) | Machine Translation Device and Method | |
JP3932912B2 (en) | Character string shaping device, method and program | |
JPS62139076A (en) | Language analysis system | |
KR20050065193A (en) | Lexical and semantic collocation based korean parsing system and the method | |
KR20010057781A (en) | Apparatus for analysing multi-word morpheme and method using the same | |
KR19990079824A (en) | A morpheme interpreter and method suitable for processing compound words connected by hyphens, and a language translation device having the device | |
JP2765618B2 (en) | Language analyzer | |
JP2994539B2 (en) | Machine translation equipment | |
Waters et al. | Efficient word-graph parsing and search with a stochastic context-free grammar | |
JP2989824B2 (en) | Sentence pattern / grammar recognition method | |
JPH0443462A (en) | Proofreading support system after translation | |
JPH04296969A (en) | Mechanical translation device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
PA0109 | Patent application |
Patent event code: PA01091R01D Comment text: Patent Application Patent event date: 19941129 |
|
PA0201 | Request for examination |
Patent event code: PA02012R01D Patent event date: 19941129 Comment text: Request for Examination of Application |
|
PG1501 | Laying open of application | ||
E701 | Decision to grant or registration of patent right | ||
PE0701 | Decision of registration |
Patent event code: PE07011S01D Comment text: Decision to Grant Registration Patent event date: 19970829 |
|
GRNT | Written decision to grant | ||
PR0701 | Registration of establishment |
Comment text: Registration of Establishment Patent event date: 19970911 Patent event code: PR07011E01D |
|
PR1002 | Payment of registration fee |
Payment date: 19970911 End annual number: 3 Start annual number: 1 |
|
PG1601 | Publication of registration | ||
PR1001 | Payment of annual fee |
Payment date: 20000629 Start annual number: 4 End annual number: 4 |
|
PR1001 | Payment of annual fee |
Payment date: 20010702 Start annual number: 5 End annual number: 5 |
|
PR1001 | Payment of annual fee |
Payment date: 20020626 Start annual number: 6 End annual number: 6 |
|
PR1001 | Payment of annual fee |
Payment date: 20030715 Start annual number: 7 End annual number: 7 |
|
PR1001 | Payment of annual fee |
Payment date: 20040702 Start annual number: 8 End annual number: 8 |
|
PR1001 | Payment of annual fee |
Payment date: 20050831 Start annual number: 9 End annual number: 9 |
|
PR1001 | Payment of annual fee |
Payment date: 20060814 Start annual number: 10 End annual number: 10 |
|
PR1001 | Payment of annual fee |
Payment date: 20070903 Start annual number: 11 End annual number: 11 |
|
PR1001 | Payment of annual fee |
Payment date: 20080909 Start annual number: 12 End annual number: 12 |
|
PR1001 | Payment of annual fee |
Payment date: 20090909 Start annual number: 13 End annual number: 13 |
|
FPAY | Annual fee payment |
Payment date: 20100906 Year of fee payment: 14 |
|
PR1001 | Payment of annual fee |
Payment date: 20100906 Start annual number: 14 End annual number: 14 |
|
LAPS | Lapse due to unpaid annual fee | ||
PC1903 | Unpaid annual fee |
Termination category: Default of registration fee Termination date: 20120809 |