[go: up one dir, main page]

JPS62274366A - Dictionary retrieving device - Google Patents

Dictionary retrieving device

Info

Publication number
JPS62274366A
JPS62274366A JP61117773A JP11777386A JPS62274366A JP S62274366 A JPS62274366 A JP S62274366A JP 61117773 A JP61117773 A JP 61117773A JP 11777386 A JP11777386 A JP 11777386A JP S62274366 A JPS62274366 A JP S62274366A
Authority
JP
Japan
Prior art keywords
word
character
nodal point
node
directory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP61117773A
Other languages
Japanese (ja)
Inventor
Shinsuke Sakai
坂井 信輔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP61117773A priority Critical patent/JPS62274366A/en
Publication of JPS62274366A publication Critical patent/JPS62274366A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

PURPOSE:To shorten the processing time required for comparing a character, by retrieving only a candidate word which has been limited at a stage of retrieving a dictionary. CONSTITUTION:A character-string which has been delivered from an analysis control part is held in a retrieval character-string buffer 212. A state storage 211 holds a back connection category of a word which has been retrieved immediately before. An index 213 holds a pointer to such a tree structure directory as a word beginning with each character becomes a path extending from a tree root nodal point to a nodal point for showing a terminal of one word. Each nodal point on the tree structure directory has a character corresponding to its nodal point, a succeeding point selecting part, and a pointer to a succeeding nodal point. The nodal point having a symbol for showing a terminal of a word is followed by word information of its word. A retrieval control part 210 obtains an appropriate tree structure directory by the index 213 from the first character in the buffer 212, compares a character in the buffer 212 with a character being attendant on the nodal point of the tree structure directory, and when it reaches a terminal symbol of the word by following the nodal point, it becomes a fact that the word has been found out.

Description

【発明の詳細な説明】 発明の詳細な説明 (産業上の利用分野) この発明は、自然言語解析のための辞書検索装置に関す
るものである。
DETAILED DESCRIPTION OF THE INVENTION Detailed Description of the Invention (Field of Industrial Application) The present invention relates to a dictionary search device for natural language analysis.

(従来の技術) 従来、辞書に対する本構造のディレクトリにおいては、
木のひとつの節点は一つの文字を表わし、単語の左端か
ら共通の文字列を持つ複数の単語は、いくつかの共有す
る節点をもつ。例えば、造語成分「よう」、意志・推量
の助動詞「よう」、比況の助動詞「ようだ」・[よって
す]およびそれらの活用形(たとえば[ように」・「よ
うな」、[ようでし]など)は、2つの共通する節点[
よ]・「う」を持つようになっている([日本音響学会
音声研究会資料J 882−82649ページ〜654
ページ)。
(Prior Art) Conventionally, in a directory with this structure for a dictionary,
One node of the tree represents one character, and words that have a common string of characters from the left end of the word have some nodes in common. For example, the coined word ``you'', the auxiliary verb ``you'' of will/inference, the auxiliary verb ``yoda''/[yottesu] of comparison, and their conjugated forms (for example, ``you'', ``yona'', ``you'') ) is defined by two common nodes [
yo] and ``u'' ([Acoustical Society of Japan Speech Study Group Materials J 882-82649-654
page).

(発明が解決しようとする問題点) ところが、例えば本構造のディレクトリを持つ辞書検索
装置を用いて最長一致法により[・・見ようによっては
・・・」という入力文字列の形態素解析をする場合を考
えると、「見」まで単語の同定が終わった時点で、次の
単語の候補となりうるちのには、前述の造語成分[よう
]、意志・推量の助動詞rよう]、比況の助動詞「よう
だ」の活用形「ように」なとも含まれるようになり、比
況の助動詞「ようだ」の活用形「ように」をまず単語候
補として、単語間の接続可能性を記述した表を参照し、
直前の単語と接続可能か判定する必要があった。しかも
、「ようです」・「ような]・「ようでし」などが、次
の単語候補とならないことを判定するためには、入力文
字列と、「ようです」・[ような]・[ようでし]など
との比較が必要であった。このことは、最長一致法で形
態解析を行なう場合にバックトラックの必要を増大させ
たり、文字の比較に要する処理時間の増加を招く原因と
なっていた。
(Problem to be Solved by the Invention) However, for example, when using a dictionary search device having a directory with this structure to perform morphological analysis of an input character string such as "...depending on how you look at it..." using the longest match method, If you think about it, once you have identified the word up to ``mi'', the possible candidates for the next word include the above-mentioned coined word component ``you'', the auxiliary verb ``you'' of will/inference, and the auxiliary verb ``you'' of comparison. The conjugated form of ``da'' is now also included, and the conjugated form of the auxiliary verb ``yoda'' of the metaphorical auxiliary verb ``yoda'' is used as a word candidate, and then refer to the table that describes the connection possibilities between words. death,
It was necessary to determine whether it could be connected to the previous word. Moreover, in order to determine that "Yo desu", "Yo na", "Yo deshi", etc. are not candidates for the next word, the input character string and "Yo desu", [Yo na], [Yo deshi], etc. It was necessary to compare with This increases the need for backtracking when performing morphological analysis using the longest match method, and increases the processing time required to compare characters.

本発明の目的はこの問題点を解決した辞書検索装置を提
供することにある。
An object of the present invention is to provide a dictionary search device that solves this problem.

(問題点を解決するための手段) 本発明は、単語を構成する各シンボルを節点とする本構
造のディレクトリをたどりながら単語を検索する辞書検
索装置において、少なくとも、直前に検索した単語の後
方接続カテゴリーを保持する状態記憶と、各分岐節点に
、前記状態記憶に保持された直前の単語の後方接続カテ
ゴリーから到達可能な後続する節点を選択する後続節点
選択手段と、後続する節点へのポインタ群とを有する本
構造のディレクトリ部を有することを特徴としている。
(Means for Solving the Problems) The present invention provides a dictionary search device that searches for a word while tracing a directory with this structure in which each symbol constituting a word is a node, at least a backward connection of a word searched immediately before. a state memory that holds categories; a subsequent node selection means that selects, for each branching node, a subsequent node that is reachable from the backward connected category of the immediately preceding word held in the state memory; and a group of pointers to the subsequent nodes. It is characterized by having a directory part having this structure.

(作用) 本発明に於いては、木構造ディレクトリ上の各分岐節点
の有する後続節点選択部が、状態記憶に保持された直前
に検索した単語の後方接続カテゴリーを利用することに
より、その節点から到達可能な後続節点を限定する。こ
れによって、前方接続不可能なカテゴリーの単語候補を
先に採用してしまうことによるバックトラックや、木構
造ディレクトリのある節点から後続する節点へとたどっ
ていく際の入力中のシンボルと本構造ディレクトリの節
点のシンボルとの比較の回数を減らすことが可能となる
(Operation) In the present invention, the successor node selection section of each branch node on the tree structure directory uses the backward connection category of the word searched immediately before, which is stored in the state memory, to select Limit the reachable successor nodes. This allows you to avoid backtracking by first selecting word candidates from categories that cannot be forward-connected, and to avoid backtracking between the symbols currently being input and the main structure directory when tracing from a node in a tree structure directory to a subsequent node. This makes it possible to reduce the number of comparisons between nodes and symbols.

(実施例) 以下、第1図から第4図を用いて本発明の実施例につい
て説明する。第1図は本発明の辞書検索装置を用いた形
態素解析装置の一例を示す構成図である。第2図は本発
明による辞書検索装置の構成を示したものである。第3
図は第2図の本構造ディレクトリの節点の構成を示した
ものである。第4図は、第3図の後続節点選択部の機能
を、第2図の節点202の後続節点選択部を例にとって
表わしたものである。
(Example) Hereinafter, an example of the present invention will be described using FIGS. 1 to 4. FIG. 1 is a block diagram showing an example of a morphological analysis device using a dictionary search device of the present invention. FIG. 2 shows the configuration of a dictionary search device according to the present invention. Third
The figure shows the structure of the nodes of the book structure directory in FIG. 2. FIG. 4 shows the function of the subsequent node selection section in FIG. 3, taking the subsequent node selection section for node 202 in FIG. 2 as an example.

入力テキスト記憶101は、入力テキストを保持する。Input text storage 101 holds input text.

解析制御部102は、入力テキスト記憶101から部分
文字列を読みこみ、辞書検索装置103にその文字列を
わたすことにより辞書引きを行いながら、入力テキスト
の形態解析を進めて行く。
The analysis control unit 102 reads a partial character string from the input text storage 101 and passes the character string to the dictionary search device 103 to perform dictionary lookup while proceeding with the morphological analysis of the input text.

第2図において、検索文字列バッファ212には、解析
制御部102かられたされた文字列が保持される。状態
記憶211は、直前に検索された単語の後方接続カテゴ
リーを保持する。インデックス213は、各文字に対し
て、その文字で始る単語が木の根節点から1つの単語の
終端を表す節点間へのパスとなっているような本構造デ
ィレクトリへのポインタを保持する。
In FIG. 2, a search string buffer 212 holds the string received from the analysis control unit 102. The state memory 211 holds the backward connection category of the most recently searched word. Index 213 holds, for each letter, a pointer to a book structure directory in which words starting with that letter are paths from the root node of the tree to the nodes representing the end of a word.

本構造ディレクトリ上の各節点は単語を構成している一
つの文字に対応し、ある節点から他のある節点へ向かう
枝はその二つの節点がある単語をあられすパス上にある
ことを示す。例えば、比況の助動詞「ようだ」の連体形
「ようなJは、節点201.202,206,208を
通るパスによって表されている。
Each node on this structural directory corresponds to one character that makes up a word, and a branch from one node to another indicates that the two nodes are on the path that leads to the word. For example, the adnominal form of the auxiliary verb ``yoda'' in the metaphorical situation ``like J'' is represented by a path passing through the nodes 201, 202, 206, and 208.

本構造ディレクトリ上の各節点は、第3図に記述されて
いるように、その節点に対応する文字、後続節点選択部
、後続節点へのポインタを有する。
Each node on this structure directory has a character corresponding to the node, a subsequent node selection section, and a pointer to the subsequent node, as described in FIG.

単語の終端を現す記号■を持つ節点には、その単語の単
語情報が後続する。
A node with a symbol ■ indicating the end of a word is followed by the word information of that word.

検索制御部210は検索文字列バッファ212中の最初
の文字から、インデックス213により適切な木構造デ
ィレクトリを得て、検索文字列バッファ212中の文字
を左から右へ1文字ずつたどりながら、検索文字列バッ
ファ212中の文字と木構造ディレクトリの節点に付随
している文字を比較して木構造ディレクトリの節点をた
どっていく。
The search control unit 210 uses the index 213 to obtain an appropriate tree-structured directory from the first character in the search string buffer 212, and searches for the search character while tracing the characters in the search string buffer 212 one by one from left to right. The characters in the column buffer 212 are compared with the characters associated with nodes in the tree structure directory, and the nodes in the tree structure directory are traced.

単語の終端をあられす記号間にたどり着いたら、ある単
語が見つかったことになるわけである。
When you reach the end of a word between the symbols, you have found a word.

例えば「・・見ようによっては・・」という入力テキス
トを左から右に最長一致優先に形態素解析する場合を考
える。いま、「見」まで単語の同定が終わって、状態記
憶211には、第4図に見られるような22という単語
rJLJの後方接続カテゴリーが設之されているとする
。解析制御部102は、テキスト中の[見jまで同定さ
れているので、部分文字列「ようによっては」を辞書検
索装置103にわたすことにより辞書引きをおこなう。
For example, consider the case where an input text such as "...depending on how you look at it..." is morphologically analyzed from left to right, giving priority to the longest match. It is now assumed that the identification of words up to ``mi'' has been completed, and a backward connection category of the word rJLJ, 22, as shown in FIG. 4 has been set in the state memory 211. The analysis control unit 102 performs a dictionary lookup by passing the partial character string ``Yo Yakuwa'' to the dictionary search device 103 since up to ``J'' in the text has been identified.

わたされた文字列「ようによっては」は、辞書検索装置
の検索文字列バッファ212に保持される。検索制御部
210は、検索文字列「ようによっては」の最初の文字
[よ」からインデックス213により、節点201を根
節点とする本構造ディレクトリを得る。「よ」を伴う節
点201をたどって「う]を伴う節点202にたどりつ
いたとき、節点202に後続する可能性のある節点とし
ては、単語の終端をあられす記号間を伴う節点203、
文字「だ]を伴う節点204、文字[で]を伴う節点2
05、文字「な」を伴う節点206、文字[に]を伴う
節点207などであるが、現在の状態記憶中の後方接続
力デゴリーは22であるから、第4図に示されている節
点202の後続節点選択部の機能により、次に進むこと
が可能な節点として、単語の終端をあられす節点■すな
わち節点203のみが選択され、単語「よう」が入力テ
キスト中の現在の位置の単語候補となり、辞書検索装置
の出力として[ようJの単語情報が解析制御部102に
返される。この際、入力テキスト中の文字「に」と、本
構造ディレクトリの節点[だ]・「で」・「な」・「に
Jとの比較演算を行なう必要はない。こうして解析制御
部102は入力テキストの次の位置「に」にすすみ、解
析処理を進めていく。
The passed character string "Yoyowa" is held in the search character string buffer 212 of the dictionary search device. The search control unit 210 uses the index 213 to obtain a main structure directory with the node 201 as the root node from the first character [yo] of the search character string "Yo Yakuha". When a node 201 with "yo" is traced to a node 202 with "u", the nodes that may follow the node 202 include a node 203 with a symbol that marks the end of a word,
Node 204 with the character “da”, node 2 with the character “de”
05, a node 206 with the character "na", a node 207 with the character [ni], etc. However, since the backward connection force degory in the current state memory is 22, the node 202 shown in FIG. Due to the function of the subsequent node selection unit, only the node that marks the end of the word, i.e., node 203, is selected as the next possible node, and the word "yo" is selected as a word candidate at the current position in the input text. Then, the word information for [YoJ] is returned to the analysis control unit 102 as the output of the dictionary search device. At this time, there is no need to perform a comparison operation between the character "ni" in the input text and the nodes [da], "de", "na", and "niJ in the main structure directory. In this way, the analysis control unit 102 inputs Proceed to the next position in the text, ``ni'', and proceed with the analysis process.

一方、従来、最長の単語を優先して候補とする場合、こ
の、後続節点を直前の単語の後続カテゴリーからあらか
じめ限定してしまう機能がなければ、比況の助動詞「よ
うだ」の連用形[ように]がまず単語候補として選択さ
れてしまうので、本発明の辞書検索装置より多くの無駄
な計算ステップを踏まなければならない。なお、本発明
の辞書検索装置は漢字かな混じり文の形態素解析のみだ
けではなく、例えば音声認識装置の言語処理部にも用い
ることができる。
On the other hand, conventionally, when prioritizing the longest word as a candidate, if there was no function that preliminarily limits the subsequent node from the subsequent category of the immediately preceding word, the conjunctive form of the auxiliary verb "yoda" ] is selected as a word candidate first, so more unnecessary calculation steps are required than in the dictionary search device of the present invention. Note that the dictionary search device of the present invention can be used not only for morphological analysis of sentences containing kanji and kana, but also, for example, for the language processing section of a speech recognition device.

(発明の効果) このように、本発明によれば、辞書を検索する段階で限
定された候補単語のみを検索するので、前方接続不可能
なカテゴリーの単語候補を先に採用することによるバッ
クトラックや、無駄なシンボルとシンボルの比較の回数
を減らすことが可能となる。
(Effects of the Invention) As described above, according to the present invention, only limited candidate words are searched at the stage of searching a dictionary, so backtracking is achieved by first employing word candidates in categories that cannot be forward connected. It is also possible to reduce the number of unnecessary symbol-to-symbol comparisons.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の辞書検索装置を用いた形態素解析装置
の一例を示す構成図である。第2図は本発明による辞書
検索装置の木構造ディレクトリの構成例を示す図、第3
図は第2図の本構造ディレクトリ節点の構成を示す図で
ある。第4図は、第2図の節点202の後続節点選択部
を例にとって後続節点選択部の機能を表わした図である
。 図中、101は入力テキスト記憶、102は解析制御部
、103は辞書検索装置、210は検索制御部、211
は状態記憶、212は検索文字列バッファ、213はイ
ンデックス、201,202,204,205,206
,207は木構造ディレクトリの節点、203,208
は、木構造ディレクトリのQ :X (【 襞笑 后 第4図
FIG. 1 is a block diagram showing an example of a morphological analysis device using a dictionary search device of the present invention. FIG. 2 is a diagram showing an example of the structure of a tree-structured directory of the dictionary search device according to the present invention, and FIG.
The figure shows the structure of the main structure directory node in FIG. 2. FIG. 4 is a diagram illustrating the function of the subsequent node selection section, taking the subsequent node selection section of the node 202 in FIG. 2 as an example. In the figure, 101 is an input text storage, 102 is an analysis control unit, 103 is a dictionary search device, 210 is a search control unit, 211
is a state memory, 212 is a search string buffer, 213 is an index, 201, 202, 204, 205, 206
, 207 are nodes of the tree structure directory, 203, 208
is the tree-structured directory Q:X ([Figure 4

Claims (1)

【特許請求の範囲】 単語を構成する各シンボルを節点とする木構造のディレ
クトリをたどりながら単語を検索する辞書検索装置にお
いて、 (a)少なくとも、直前に検索した単語の後方接続カテ
ゴリーを保持する状態記憶と、 (b)各分岐節点に、前記状態記憶に保持された直前の
単語の後方接続カテゴリーから到達可能な後続する節点
を選択する後続節点選択手段と、後続する節点へのポイ
ンタ群とを有する木構造のディレクトリ部、 を有することを特徴とする辞書検索装置。
[Scope of Claims] In a dictionary search device that searches for a word while tracing a tree-structured directory in which each symbol constituting a word is a node, (a) a state in which at least the backward connection category of the word searched immediately before is retained; (b) for each branching node, a subsequent node selection means for selecting a subsequent node that is reachable from the backward connection category of the immediately preceding word held in the state memory, and a group of pointers to the subsequent nodes; A dictionary search device comprising: a directory section having a tree structure;
JP61117773A 1986-05-21 1986-05-21 Dictionary retrieving device Pending JPS62274366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61117773A JPS62274366A (en) 1986-05-21 1986-05-21 Dictionary retrieving device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61117773A JPS62274366A (en) 1986-05-21 1986-05-21 Dictionary retrieving device

Publications (1)

Publication Number Publication Date
JPS62274366A true JPS62274366A (en) 1987-11-28

Family

ID=14719967

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61117773A Pending JPS62274366A (en) 1986-05-21 1986-05-21 Dictionary retrieving device

Country Status (1)

Country Link
JP (1) JPS62274366A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05290082A (en) * 1992-03-23 1993-11-05 Internatl Business Mach Corp <Ibm> Translater based on pattern

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05290082A (en) * 1992-03-23 1993-11-05 Internatl Business Mach Corp <Ibm> Translater based on pattern

Similar Documents

Publication Publication Date Title
JPH0689302A (en) Dictionary memory
JP2007334429A (en) Keyword generating method, document retrieval method, topic range estimating method, topic boundary estimating method, and device therefor, its program, and its recording medium
JPS62274366A (en) Dictionary retrieving device
JP2595934B2 (en) Kana-Kanji conversion processor
JPH0793345A (en) Document retrieval device
JPS588379A (en) Kana (japanese syllabary)-kanji (chinese character) converting system
JP3353769B2 (en) Character recognition device, character recognition method, and character recognition program recording medium
JPS61190657A (en) Recognizing system for japanese language character string
JPH0350669A (en) Information processor
JPH10254881A (en) Machine translation device
JPS59100939A (en) Japanese word input device
JPH04290158A (en) Document creation device
JPS61267824A (en) Japanese data sorting processing method
JPS62282364A (en) Character string retrieval system
JP2574741B2 (en) Language processing method
JPS63138479A (en) Character recognizing device
JPS63129465A (en) Sentence understanding support device
JPS59116835A (en) Japanese input device with input abbreviating function
JPS62203276A (en) Form element analysis device
JPS60193068A (en) Text analysis method
JPH04279966A (en) Clause punctuation learning information retrieval system for kana-kanji conversion device
JPH0695330B2 (en) Document creation device
JPH03152667A (en) Analyzing method for japanese sentence
JPS61156464A (en) Document preparing device
JPS60140460A (en) Abbreviated converting system in kana (japanese syllabary) kanji (chinese character) converter