JPS62274366A

JPS62274366A - Dictionary retrieving device

Info

Publication number: JPS62274366A
Application number: JP61117773A
Authority: JP
Inventors: Shinsuke Sakai; 坂井　信輔
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-05-21
Filing date: 1986-05-21
Publication date: 1987-11-28

Abstract

PURPOSE:To shorten the processing time required for comparing a character, by retrieving only a candidate word which has been limited at a stage of retrieving a dictionary. CONSTITUTION:A character-string which has been delivered from an analysis control part is held in a retrieval character-string buffer 212. A state storage 211 holds a back connection category of a word which has been retrieved immediately before. An index 213 holds a pointer to such a tree structure directory as a word beginning with each character becomes a path extending from a tree root nodal point to a nodal point for showing a terminal of one word. Each nodal point on the tree structure directory has a character corresponding to its nodal point, a succeeding point selecting part, and a pointer to a succeeding nodal point. The nodal point having a symbol for showing a terminal of a word is followed by word information of its word. A retrieval control part 210 obtains an appropriate tree structure directory by the index 213 from the first character in the buffer 212, compares a character in the buffer 212 with a character being attendant on the nodal point of the tree structure directory, and when it reaches a terminal symbol of the word by following the nodal point, it becomes a fact that the word has been found out.

Description

【発明の詳細な説明】発明の詳細な説明（産業上の利用分野）この発明は、自然言語解析のための辞書検索装置に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION Detailed Description of the Invention (Field of Industrial Application) The present invention relates to a dictionary search device for natural language analysis.

（従来の技術）従来、辞書に対する本構造のディレクトリにおいては、
木のひとつの節点は一つの文字を表わし、単語の左端か
ら共通の文字列を持つ複数の単語は、いくつかの共有す
る節点をもつ。例えば、造語成分「よう」、意志・推量
の助動詞「よう」、比況の助動詞「ようだ」・［よって
す］およびそれらの活用形（たとえば［ように」・「よ
うな」、［ようでし］など）は、２つの共通する節点［
よ］・「う」を持つようになっている（［日本音響学会
音声研究会資料Ｊ　８８２−８２６４９ページ〜６５４
ページ）。(Prior Art) Conventionally, in a directory with this structure for a dictionary,
One node of the tree represents one character, and words that have a common string of characters from the left end of the word have some nodes in common. For example, the coined word ``you'', the auxiliary verb ``you'' of will/inference, the auxiliary verb ``yoda''/[yottesu] of comparison, and their conjugated forms (for example, ``you'', ``yona'', ``you'') ) is defined by two common nodes [
yo] and ``u'' ([Acoustical Society of Japan Speech Study Group Materials J 882-82649-654
page).

（発明が解決しようとする問題点）ところが、例えば本構造のディレクトリを持つ辞書検索
装置を用いて最長一致法により［・・見ようによっては
・・・」という入力文字列の形態素解析をする場合を考
えると、「見」まで単語の同定が終わった時点で、次の
単語の候補となりうるちのには、前述の造語成分［よう
］、意志・推量の助動詞ｒよう］、比況の助動詞「よう
だ」の活用形「ように」なとも含まれるようになり、比
況の助動詞「ようだ」の活用形「ように」をまず単語候
補として、単語間の接続可能性を記述した表を参照し、
直前の単語と接続可能か判定する必要があった。しかも
、「ようです」・「ような］・「ようでし」などが、次
の単語候補とならないことを判定するためには、入力文
字列と、「ようです」・［ような］・［ようでし］など
との比較が必要であった。このことは、最長一致法で形
態解析を行なう場合にバックトラックの必要を増大させ
たり、文字の比較に要する処理時間の増加を招く原因と
なっていた。(Problem to be Solved by the Invention) However, for example, when using a dictionary search device having a directory with this structure to perform morphological analysis of an input character string such as "...depending on how you look at it..." using the longest match method, If you think about it, once you have identified the word up to ``mi'', the possible candidates for the next word include the above-mentioned coined word component ``you'', the auxiliary verb ``you'' of will/inference, and the auxiliary verb ``you'' of comparison. The conjugated form of ``da'' is now also included, and the conjugated form of the auxiliary verb ``yoda'' of the metaphorical auxiliary verb ``yoda'' is used as a word candidate, and then refer to the table that describes the connection possibilities between words. death,
It was necessary to determine whether it could be connected to the previous word. Moreover, in order to determine that "Yo desu", "Yo na", "Yo deshi", etc. are not candidates for the next word, the input character string and "Yo desu", [Yo na], [Yo deshi], etc. It was necessary to compare with This increases the need for backtracking when performing morphological analysis using the longest match method, and increases the processing time required to compare characters.

本発明の目的はこの問題点を解決した辞書検索装置を提
供することにある。An object of the present invention is to provide a dictionary search device that solves this problem.

（問題点を解決するための手段）本発明は、単語を構成する各シンボルを節点とする本構
造のディレクトリをたどりながら単語を検索する辞書検
索装置において、少なくとも、直前に検索した単語の後
方接続カテゴリーを保持する状態記憶と、各分岐節点に
、前記状態記憶に保持された直前の単語の後方接続カテ
ゴリーから到達可能な後続する節点を選択する後続節点
選択手段と、後続する節点へのポインタ群とを有する本
構造のディレクトリ部を有することを特徴としている。(Means for Solving the Problems) The present invention provides a dictionary search device that searches for a word while tracing a directory with this structure in which each symbol constituting a word is a node, at least a backward connection of a word searched immediately before. a state memory that holds categories; a subsequent node selection means that selects, for each branching node, a subsequent node that is reachable from the backward connected category of the immediately preceding word held in the state memory; and a group of pointers to the subsequent nodes. It is characterized by having a directory part having this structure.

（作用）本発明に於いては、木構造ディレクトリ上の各分岐節点
の有する後続節点選択部が、状態記憶に保持された直前
に検索した単語の後方接続カテゴリーを利用することに
より、その節点から到達可能な後続節点を限定する。こ
れによって、前方接続不可能なカテゴリーの単語候補を
先に採用してしまうことによるバックトラックや、木構
造ディレクトリのある節点から後続する節点へとたどっ
ていく際の入力中のシンボルと本構造ディレクトリの節
点のシンボルとの比較の回数を減らすことが可能となる
。(Operation) In the present invention, the successor node selection section of each branch node on the tree structure directory uses the backward connection category of the word searched immediately before, which is stored in the state memory, to select Limit the reachable successor nodes. This allows you to avoid backtracking by first selecting word candidates from categories that cannot be forward-connected, and to avoid backtracking between the symbols currently being input and the main structure directory when tracing from a node in a tree structure directory to a subsequent node. This makes it possible to reduce the number of comparisons between nodes and symbols.

（実施例）以下、第１図から第４図を用いて本発明の実施例につい
て説明する。第１図は本発明の辞書検索装置を用いた形
態素解析装置の一例を示す構成図である。第２図は本発
明による辞書検索装置の構成を示したものである。第３
図は第２図の本構造ディレクトリの節点の構成を示した
ものである。第４図は、第３図の後続節点選択部の機能
を、第２図の節点２０２の後続節点選択部を例にとって
表わしたものである。(Example) Hereinafter, an example of the present invention will be described using FIGS. 1 to 4. FIG. 1 is a block diagram showing an example of a morphological analysis device using a dictionary search device of the present invention. FIG. 2 shows the configuration of a dictionary search device according to the present invention. Third
The figure shows the structure of the nodes of the book structure directory in FIG. 2. FIG. 4 shows the function of the subsequent node selection section in FIG. 3, taking the subsequent node selection section for node 202 in FIG. 2 as an example.

入力テキスト記憶１０１は、入力テキストを保持する。Input text storage 101 holds input text.

解析制御部１０２は、入力テキスト記憶１０１から部分
文字列を読みこみ、辞書検索装置１０３にその文字列を
わたすことにより辞書引きを行いながら、入力テキスト
の形態解析を進めて行く。The analysis control unit 102 reads a partial character string from the input text storage 101 and passes the character string to the dictionary search device 103 to perform dictionary lookup while proceeding with the morphological analysis of the input text.

第２図において、検索文字列バッファ２１２には、解析
制御部１０２かられたされた文字列が保持される。状態
記憶２１１は、直前に検索された単語の後方接続カテゴ
リーを保持する。インデックス２１３は、各文字に対し
て、その文字で始る単語が木の根節点から１つの単語の
終端を表す節点間へのパスとなっているような本構造デ
ィレクトリへのポインタを保持する。In FIG. 2, a search string buffer 212 holds the string received from the analysis control unit 102. The state memory 211 holds the backward connection category of the most recently searched word. Index 213 holds, for each letter, a pointer to a book structure directory in which words starting with that letter are paths from the root node of the tree to the nodes representing the end of a word.

本構造ディレクトリ上の各節点は単語を構成している一
つの文字に対応し、ある節点から他のある節点へ向かう
枝はその二つの節点がある単語をあられすパス上にある
ことを示す。例えば、比況の助動詞「ようだ」の連体形
「ようなＪは、節点２０１．２０２，２０６，２０８を
通るパスによって表されている。Each node on this structural directory corresponds to one character that makes up a word, and a branch from one node to another indicates that the two nodes are on the path that leads to the word. For example, the adnominal form of the auxiliary verb ``yoda'' in the metaphorical situation ``like J'' is represented by a path passing through the nodes 201, 202, 206, and 208.

本構造ディレクトリ上の各節点は、第３図に記述されて
いるように、その節点に対応する文字、後続節点選択部
、後続節点へのポインタを有する。Each node on this structure directory has a character corresponding to the node, a subsequent node selection section, and a pointer to the subsequent node, as described in FIG.

単語の終端を現す記号■を持つ節点には、その単語の単
語情報が後続する。A node with a symbol ■ indicating the end of a word is followed by the word information of that word.

検索制御部２１０は検索文字列バッファ２１２中の最初
の文字から、インデックス２１３により適切な木構造デ
ィレクトリを得て、検索文字列バッファ２１２中の文字
を左から右へ１文字ずつたどりながら、検索文字列バッ
ファ２１２中の文字と木構造ディレクトリの節点に付随
している文字を比較して木構造ディレクトリの節点をた
どっていく。The search control unit 210 uses the index 213 to obtain an appropriate tree-structured directory from the first character in the search string buffer 212, and searches for the search character while tracing the characters in the search string buffer 212 one by one from left to right. The characters in the column buffer 212 are compared with the characters associated with nodes in the tree structure directory, and the nodes in the tree structure directory are traced.

単語の終端をあられす記号間にたどり着いたら、ある単
語が見つかったことになるわけである。When you reach the end of a word between the symbols, you have found a word.

例えば「・・見ようによっては・・」という入力テキス
トを左から右に最長一致優先に形態素解析する場合を考
える。いま、「見」まで単語の同定が終わって、状態記
憶２１１には、第４図に見られるような２２という単語
ｒＪＬＪの後方接続カテゴリーが設之されているとする
。解析制御部１０２は、テキスト中の［見ｊまで同定さ
れているので、部分文字列「ようによっては」を辞書検
索装置１０３にわたすことにより辞書引きをおこなう。For example, consider the case where an input text such as "...depending on how you look at it..." is morphologically analyzed from left to right, giving priority to the longest match. It is now assumed that the identification of words up to ``mi'' has been completed, and a backward connection category of the word rJLJ, 22, as shown in FIG. 4 has been set in the state memory 211. The analysis control unit 102 performs a dictionary lookup by passing the partial character string ``Yo Yakuwa'' to the dictionary search device 103 since up to ``J'' in the text has been identified.

わたされた文字列「ようによっては」は、辞書検索装置
の検索文字列バッファ２１２に保持される。検索制御部
２１０は、検索文字列「ようによっては」の最初の文字
［よ」からインデックス２１３により、節点２０１を根
節点とする本構造ディレクトリを得る。「よ」を伴う節
点２０１をたどって「う］を伴う節点２０２にたどりつ
いたとき、節点２０２に後続する可能性のある節点とし
ては、単語の終端をあられす記号間を伴う節点２０３、
文字「だ］を伴う節点２０４、文字［で］を伴う節点２
０５、文字「な」を伴う節点２０６、文字［に］を伴う
節点２０７などであるが、現在の状態記憶中の後方接続
力デゴリーは２２であるから、第４図に示されている節
点２０２の後続節点選択部の機能により、次に進むこと
が可能な節点として、単語の終端をあられす節点■すな
わち節点２０３のみが選択され、単語「よう」が入力テ
キスト中の現在の位置の単語候補となり、辞書検索装置
の出力として［ようＪの単語情報が解析制御部１０２に
返される。この際、入力テキスト中の文字「に」と、本
構造ディレクトリの節点［だ］・「で」・「な」・「に
Ｊとの比較演算を行なう必要はない。こうして解析制御
部１０２は入力テキストの次の位置「に」にすすみ、解
析処理を進めていく。The passed character string "Yoyowa" is held in the search character string buffer 212 of the dictionary search device. The search control unit 210 uses the index 213 to obtain a main structure directory with the node 201 as the root node from the first character [yo] of the search character string "Yo Yakuha". When a node 201 with "yo" is traced to a node 202 with "u", the nodes that may follow the node 202 include a node 203 with a symbol that marks the end of a word,
Node 204 with the character “da”, node 2 with the character “de”
05, a node 206 with the character "na", a node 207 with the character [ni], etc. However, since the backward connection force degory in the current state memory is 22, the node 202 shown in FIG. Due to the function of the subsequent node selection unit, only the node that marks the end of the word, i.e., node 203, is selected as the next possible node, and the word "yo" is selected as a word candidate at the current position in the input text. Then, the word information for [YoJ] is returned to the analysis control unit 102 as the output of the dictionary search device. At this time, there is no need to perform a comparison operation between the character "ni" in the input text and the nodes [da], "de", "na", and "niJ in the main structure directory. In this way, the analysis control unit 102 inputs Proceed to the next position in the text, ``ni'', and proceed with the analysis process.

一方、従来、最長の単語を優先して候補とする場合、こ
の、後続節点を直前の単語の後続カテゴリーからあらか
じめ限定してしまう機能がなければ、比況の助動詞「よ
うだ」の連用形［ように］がまず単語候補として選択さ
れてしまうので、本発明の辞書検索装置より多くの無駄
な計算ステップを踏まなければならない。なお、本発明
の辞書検索装置は漢字かな混じり文の形態素解析のみだ
けではなく、例えば音声認識装置の言語処理部にも用い
ることができる。On the other hand, conventionally, when prioritizing the longest word as a candidate, if there was no function that preliminarily limits the subsequent node from the subsequent category of the immediately preceding word, the conjunctive form of the auxiliary verb "yoda" ] is selected as a word candidate first, so more unnecessary calculation steps are required than in the dictionary search device of the present invention. Note that the dictionary search device of the present invention can be used not only for morphological analysis of sentences containing kanji and kana, but also, for example, for the language processing section of a speech recognition device.

（発明の効果）このように、本発明によれば、辞書を検索する段階で限
定された候補単語のみを検索するので、前方接続不可能
なカテゴリーの単語候補を先に採用することによるバッ
クトラックや、無駄なシンボルとシンボルの比較の回数
を減らすことが可能となる。(Effects of the Invention) As described above, according to the present invention, only limited candidate words are searched at the stage of searching a dictionary, so backtracking is achieved by first employing word candidates in categories that cannot be forward connected. It is also possible to reduce the number of unnecessary symbol-to-symbol comparisons.

[Brief explanation of drawings]

第１図は本発明の辞書検索装置を用いた形態素解析装置
の一例を示す構成図である。第２図は本発明による辞書
検索装置の木構造ディレクトリの構成例を示す図、第３
図は第２図の本構造ディレクトリ節点の構成を示す図で
ある。第４図は、第２図の節点２０２の後続節点選択部
を例にとって後続節点選択部の機能を表わした図である
。図中、１０１は入力テキスト記憶、１０２は解析制御部
、１０３は辞書検索装置、２１０は検索制御部、２１１
は状態記憶、２１２は検索文字列バッファ、２１３はイ
ンデックス、２０１，２０２，２０４，２０５，２０６
，２０７は木構造ディレクトリの節点、２０３，２０８
は、木構造ディレクトリのＱ　：Ｘ（【襞笑后第４図FIG. 1 is a block diagram showing an example of a morphological analysis device using a dictionary search device of the present invention. FIG. 2 is a diagram showing an example of the structure of a tree-structured directory of the dictionary search device according to the present invention, and FIG.
The figure shows the structure of the main structure directory node in FIG. 2. FIG. 4 is a diagram illustrating the function of the subsequent node selection section, taking the subsequent node selection section of the node 202 in FIG. 2 as an example. In the figure, 101 is an input text storage, 102 is an analysis control unit, 103 is a dictionary search device, 210 is a search control unit, 211
is a state memory, 212 is a search string buffer, 213 is an index, 201, 202, 204, 205, 206
, 207 are nodes of the tree structure directory, 203, 208
is the tree-structured directory Q:X ([Figure 4

Claims

[Scope of Claims] In a dictionary search device that searches for a word while tracing a tree-structured directory in which each symbol constituting a word is a node, (a) a state in which at least the backward connection category of the word searched immediately before is retained; (b) for each branching node, a subsequent node selection means for selecting a subsequent node that is reachable from the backward connection category of the immediately preceding word held in the state memory, and a group of pointers to the subsequent nodes; A dictionary search device comprising: a directory section having a tree structure;