JP3005531B1

JP3005531B1 - Dictionary data search method and apparatus, search dictionary and index creation method

Info

Publication number: JP3005531B1
Application number: JP10228069A
Authority: JP
Inventors: 佳徳田村
Original assignee: 日本電気アイシーマイコンシステム株式会社
Priority date: 1998-08-12
Filing date: 1998-08-12
Publication date: 2000-01-31
Anticipated expiration: 2018-08-12
Also published as: JP2000057165A

Abstract

【要約】【課題】入力された文字列に対する辞書データの検索
文字数を削減して、データ検索の高速化を実現する。【解決手段】インデックス検索手段２１は、入力装置
１から入力された文字列と、インデックス記憶部３１に
記憶されたレコードの文字列（例えば、ans,ant@,ant
a）とを比較することにより、入力された文字列が該当
する分割辞書があるか否かを調べる。辞書検索手段２２
は、インデックス検索手段２１がレコード，に基づいて
見つけた分割辞書に相当する分割辞書を辞書記憶部３２
より取り出し、例えば、入力文字列として"antarctic"
が入力装置１から入力された場合、インデックス検索手
段２１は、先頭文字列"anta"を抽出したとする。辞書検
索手段２２は、先頭文字列が"anta"である分割辞書を辞
書記憶部３２より取り出す。取り出した分割辞書は、先
頭文字列が"anta"で始まる見出し語の集合であるため、
検索したい文字列"antarctic"の先頭４文字antaは、比
較する必要がなくなる。An object of the present invention is to reduce the number of search characters in dictionary data for an input character string, thereby realizing high-speed data search. An index search unit (21) includes a character string input from an input device (1) and a character string of a record (for example, ans, ant @, ant) stored in an index storage unit (31).
By comparing with a), it is checked whether or not there is a divided dictionary corresponding to the input character string. Dictionary search means 22
Stores the divided dictionary corresponding to the divided dictionary found based on the record by the index search means 21 in the dictionary storage unit 32
, For example, "antarctic" as the input string
Is input from the input device 1, it is assumed that the index search means 21 has extracted the leading character string "anta". The dictionary search unit 22 retrieves a divided dictionary whose head character string is “anta” from the dictionary storage unit 32. Since the extracted dictionary is a set of headwords whose first character string starts with "anta",
The first four characters anta of the character string "antarctic" to be searched need not be compared.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、キーボードのキー
等の入力装置から入力された文字列を辞書の見出し語か
ら検索する辞書データ検索方法及びその装置、検索用辞
書及びインデックスの作成方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dictionary data retrieval method and apparatus for retrieving a character string input from an input device such as a keyboard key from a dictionary entry word, a retrieval dictionary, and a method of creating an index. It is.

【０００２】[0002]

【従来の技術】従来、インデックスを使って辞書の見出
し語を検索する場合、インデックスの見出し語として辞
書の見出し語をそのまま使っていた。2. Description of the Related Art Conventionally, when a dictionary entry word is searched using an index, a dictionary entry word has been used as it is as an index entry word.

【０００３】これは、見出し語を検索する場合にインデ
ックス上で辞書にある見出し語か辞書にない見出し語か
の判断をすることができるが、辞書の見出し語の数が増
えれば、インデックスの見出し語の数も同数だけ増えて
しまい、メモリ効率上に問題が残る。[0003] When searching for a headword, it is possible to determine whether the headword is in the dictionary or not in the dictionary on the index. The number of words increases by the same number, and a problem remains in terms of memory efficiency.

【０００４】そこで、辞書をいくつかの小さな辞書に分
割し、その分割した辞書の先頭を指し示すようにインデ
ックスを設定して、インデックス上での検索と辞書上で
の検索の２段階での検索をすることにより、少ないメモ
リで多くの見出し語を検索する方法が開発されている。[0004] Therefore, the dictionary is divided into several small dictionaries, an index is set so as to point to the head of the divided dictionaries, and a search in two stages, a search on the index and a search on the dictionary, is performed. Thus, a method of searching for many headwords with a small memory has been developed.

【０００５】[0005]

【発明が解決しようとする課題】上述した検索方法とし
ては、特開昭６２−１９７８２２号公報に開示されたも
のがあるが、特開昭６２−１９７８２２号公報に開示さ
れた技術では、辞書を機械的に分割しているため、例え
ば、"azure" というインデックスで指定された分割辞書
の中に、"b" で始まる語が含まれてしまうことがあり、
そのため、分割辞書から入力文字列を検索するには、１
文字目から比較しなければならないという問題がある。The search method described above is disclosed in Japanese Patent Application Laid-Open No. Sho 62-197822. However, in the technology disclosed in Japanese Patent Application Laid-Open No. Sho 62-197822, a dictionary is used. For example, words that start with "b" may be included in the split dictionary specified by the index "azure" due to mechanical division.
Therefore, to search an input character string from a divided dictionary,
There is a problem that comparison must be performed from the first character.

【０００６】また、入力文字列を検索する際に検索文字
数を削減することにより、特開昭６２−１９７８２２号
公報に開示された技術の問題を解決する検索方法が特開
平０７−１２９６００号公報に開示されている。A search method for solving the problem of the technology disclosed in Japanese Patent Application Laid-Open No. 62-197822 by reducing the number of search characters when searching for an input character string is disclosed in Japanese Patent Application Laid-Open No. 07-129600. It has been disclosed.

【０００７】図１１は、特開平０７−１２９６００号公
報に開示された検索方法に使用するインデックスと辞書
のデータ構造を示す図である。特開平０７−１２９６０
０号公報では、日本語辞書の例が記載されているが、本
発明と対比するために英和辞書に置換えて説明する。FIG. 11 is a diagram showing a data structure of an index and a dictionary used in the search method disclosed in Japanese Patent Application Laid-Open No. 07-129600. JP-A-07-12960
In Japanese Patent Publication No. 0, an example of a Japanese dictionary is described, but for the sake of comparison with the present invention, an English-Japanese dictionary will be used.

【０００８】図１１に示すインデックスＣ１は、分割辞
書の先頭見出し語を集めた構造になっている。辞書Ｃ２
は、複数の分割辞書を集合させた構造になっており、各
分割辞書は、インデックスＣ１に取り込まれている先頭
見出し語以外の見出し語がその直前の見出し語に一致し
ている一致文字数を表す符号（例えば、１，２，３）
と、一致文字数に対応する文字列に後続する文字（例え
ば、back,ndon,te等）とを対応付けた構造になってい
る。The index C1 shown in FIG. 11 has a structure in which head words of the divided dictionary are collected. Dictionary C2
Has a structure in which a plurality of divided dictionaries are aggregated, and each divided dictionary indicates the number of matching characters in which a headword other than the headword captured in the index C1 matches the headword immediately before it. Code (for example, 1, 2, 3)
And a character (for example, back, ndon, te, etc.) subsequent to the character string corresponding to the number of matching characters.

【０００９】特開平０７−１２９６００号公報には記載
されていないが、理解を容易にするため、特開平０７−
１２９６００号公報に記載された分割辞書に対応するワ
ード（例えば、aback,abandon等）Ｃ３を記載してい
る。Although it is not described in JP-A-07-129600, it is disclosed in JP-A-07-129600 for easy understanding.
No. 129600 describes a word (for example, aback, abandon, etc.) C3 corresponding to the divided dictionary.

【００１０】次に、図１２に基づいて特開平０７−１２
９６００号公報に開示された検索方法において、分割辞
書から検索する動作を具体例を用いて詳細に説明する。Next, based on FIG.
In the search method disclosed in Japanese Patent Application Publication No. 9600, the operation of searching from a divided dictionary will be described in detail using a specific example.

【００１１】図１１に示すインデックスＣ１及び辞書Ｃ
２を用いて、文字列"abduce"を検索する場合を説明す
る。The index C1 and the dictionary C shown in FIG.
The case where the character string "abduce" is searched using "2" will be described.

【００１２】図１２において、先頭の見出し語が"abbe
y"（日本語：修道院）である分割辞書を取り込んだ後、
ステップＪ１において、初期処理として一致文字数を
“０”にリセットし、ステップＪ２において、第１デー
タ"abbey" が辞書データとして取得される。この取得し
たデータは、図１１に示す符号をもたないため、ステッ
プＪ６に進み、辞書キーワード取り出し処理にて、キー
ワード"abbey"を取り出す。In FIG. 12, the first entry word is "abbe
y "(Japanese: monastery)
In step J1, the number of matching characters is reset to "0" as initial processing, and in step J2, the first data "abbey" is obtained as dictionary data. Since the obtained data does not have the code shown in FIG. 11, the process proceeds to step J6, and the keyword “abbey” is extracted by the dictionary keyword extracting process.

【００１３】次に、ステップＪ７に進み、検索する文字
列"abduce"と辞書データ"abbey"とを先頭文字 "a" より
順次比較する。これにより、一致している文字数が２文
字（a,b）となり、検索データ"abduce"の方が長いこと
を示す“検索データ大”が出力される。Then, the process proceeds to step J7, in which the character string "abduce" to be searched and the dictionary data "abbey" are sequentially compared from the first character "a". As a result, the number of matching characters becomes two characters (a, b), and “search data large” indicating that the search data “abduce” is longer is output.

【００１４】次に、ステップＪ１０に進み、データの終
了かどうかを判断し、終了でない場合にはステップＪ１
１に進み、一致文字数を“０”から“２”に更新する。
そして、ステップＪ１２において、次の検索データを"r
eviate"にして、ステップＪ２に進む。Next, the process proceeds to step J10, where it is determined whether or not the data is completed.
Proceed to 1 to update the number of matching characters from "0" to "2".
Then, in step J12, the next search data is set to "r".
eviate "and proceed to step J2.

【００１５】図１１に示すように、検索データ"reviat
e"は符号“３”を有するため、ステップＪ３からステッ
プＪ４に進む。ステップＪ４では、検索データの符号の
数字“３”と、その時点で設定されている一致文字数と
を比較する。この場合、符号の方が大きいため、ステッ
プＪ１０に進み、データ終了であるか否かを判断し、次
の検索データを"dicate"にして、ステップＪ２に進む。As shown in FIG. 11, search data "reviat"
Since “e” has the code “3”, the process proceeds from step J3 to step J4.In step J4, the number “3” of the code of the search data is compared with the number of matching characters set at that time. , The sign is larger, the process proceeds to step J10, and it is determined whether or not the data is completed. The next search data is set to "dicate", and the process proceeds to step J2.

【００１６】図１１に示すように、検索データ"dicate"
は符号“２”を有するため、ステップＪ３からステップ
Ｊ４に進む。ステップＪ４では、検索データの符号の数
字“２”と、その時点で設定されている一致文字数とを
比較する。この場合、符号と一致文字数が等しいため、
ステップＪ５に進み、比較開始位置を３文字目とする。
ステップＪ６において、比較キーワード"dicate"を取り
出す。As shown in FIG. 11, search data "dicate"
Has the code "2", the process proceeds from step J3 to step J4. In step J4, the number "2" of the sign of the search data is compared with the number of matching characters set at that time. In this case, since the sign and the number of matching characters are equal,
Proceeding to step J5, the comparison start position is set to the third character.
In step J6, the comparison keyword "dicate" is extracted.

【００１７】次に、ステップＪ７に進み、検索する文字
列"abduce"の３文字以降"duce"と辞書データ"dicate"と
を順次比較する。これにより、一致している文字数が
“１”となり、検索データの方が長いことを示す“検索
データ大”が出力される。こうしてステップＪ１０に進
み、データの終了か否かを判断し、終了でない場合には
ステップＪ１１に進み、一致文字数を“２”から”３”
に更新して、ステップＪ１２において、次の検索データ
を"uce"にして、ステップＪ２に進む。Next, the process proceeds to step J7, in which the character string "abduce" to be searched is compared with the dictionary data "dicate" sequentially from the third character onward. As a result, the number of matching characters becomes "1", and "Large search data" indicating that the search data is longer is output. In this way, the process proceeds to step J10, where it is determined whether or not the data is completed.
Is updated to "uce" in step J12, and the flow advances to step J2.

【００１８】図１１に示すように、検索データ"uce"は
符号“３”を有するため、ステップＪ３からステップＪ
４に進む。ステップＪ４では、検索データの符号の数字
“３”と、その時点で設定されている一致文字数とを比
較する。この場合、符号と一致文字数と一致するため、
ステップＪ５に進み、比較開始位置を４文字目とする。
ステップＪ６において、比較キーワード"uce"を取り出
す。As shown in FIG. 11, since the search data "uce" has the code "3", steps J3 to J
Proceed to 4. At step J4, the number "3" of the sign of the search data is compared with the number of matching characters set at that time. In this case, since the code matches the number of matching characters,
Proceeding to step J5, the comparison start position is set to the fourth character.
In step J6, the comparison keyword "uce" is extracted.

【００１９】次に、ステップＪ７に進み、検索する文字
列"abduce"の４文字以降"uce"と辞書データ"uce"とを順
次比較する。これにより、一致している文字数が“３”
となり、比較した文字数も３文字であるため、キーワー
ドが一致したと見做して（ステップＪ８）、ステップＪ
１４に進み、目的データが「有」として処理を終了す
る。Next, the process proceeds to step J7, in which four or more characters "uce" of the character string "abduce" to be searched are compared with the dictionary data "uce" sequentially. As a result, the number of matching characters is “3”
Since the number of characters compared is also three, it is determined that the keywords match (step J8), and
Proceeding to 14, the process ends as the target data is "present".

【００２０】これにより、辞書上での検索は、一致文字
数以降の文字列についてのみ行なえばよく、１見出し語
に対する検索文字数を少なくできる。Thus, the search on the dictionary only needs to be performed for the character string after the number of matching characters, and the number of search characters for one headword can be reduced.

【００２１】しかしながら、特開平０７−１２９６００
号公報に開示された検索方法は、直前の見出し語との一
致関係の上に成り立っているため、文字列と検索データ
とを比較する処理を行なう毎に、検索文字数を更新（ス
テップＪ１１）しなければないという問題がある。However, Japanese Patent Application Laid-Open No. 07-129600
Since the search method disclosed in the above publication is based on the matching relationship with the immediately preceding headword, the number of search characters is updated each time a process of comparing a character string with search data is performed (step J11). There is a problem that must be.

【００２２】また、辞書データの検索において、単に入
力文字列を検索するだけでなく、キーボードなどの入力
装置から例えば矢印キー等を入力した場合は、検索した
辞書データの直前の辞書データを検索する場合には、分
割辞書の先頭に遡って見出し語を構築する必要があるた
め、入力文字列の辞書データを検索した後、そのデータ
の直前のデータを検索するには不向きであるという問題
がある。In the search of dictionary data, not only an input character string is searched, but also, for example, when an arrow key or the like is input from an input device such as a keyboard, the dictionary data immediately before the searched dictionary data is searched. In such a case, since it is necessary to construct a headword retroactively to the beginning of the divided dictionary, there is a problem that it is not suitable to search the data immediately before the data after searching the dictionary data of the input character string. .

【００２３】さらに、辞書に存在しない語が入力された
場合でも、インデックス上の検索では判断ができず、必
ず辞書から検索しなければならないため、辞書にない見
出し語を検索する（“目的のデータなし”を確定する）
場合にも、余計に検索時間を費やしてしまうという問題
がある。Furthermore, even if a word that does not exist in the dictionary is input, it cannot be determined by searching on the index and must be searched from the dictionary. Confirm "None")
In such a case, there is a problem that extra search time is spent.

【００２４】したがって、特開平０７−１２９６００号
公報に開示された検索方法の問題を解決するには、検索
時間を少なくするために検索文字数を減らし、また、検
索した語の前後のデータの検索も容易に行なうために
は、検索フローを見直すだけでなく、辞書構造自体を見
直さなければならない。Therefore, in order to solve the problem of the search method disclosed in Japanese Patent Application Laid-Open No. 07-129600, the number of search characters is reduced to reduce the search time, and the search of data before and after the searched word is also performed. In order to easily perform the search, it is necessary to review not only the search flow but also the dictionary structure itself.

【００２５】本発明の目的は、従来の問題点を解決する
辞書データ検索方法及びその装置、検索用辞書及びイン
デックスの作成方法を提供することにある。An object of the present invention is to provide a dictionary data search method and apparatus, a search dictionary and an index creation method which solve the conventional problems.

【００２６】[0026]

【課題を解決するための手段】前記目的を達成するた
め、本発明に係る辞書データ検索方法は、入力された文
字列を辞書の見出し語から検索する辞書データ検索方法
であって、前記入力された文字列のデータからインデッ
クスとして先頭文字を抽出し、先頭文字列を共通とした
複数の見出し語で構成されており、先頭文字列の文字数
を変えてデータ量を所定許容範囲内に制限した複数の分
割辞書と、先頭文字列をインデックスとして先頭文字列
のデータと該先頭文字列に対応する前記分割辞書のデー
タとを関連付けた複数のレコードとを用いて、前記抽出
した先頭文字列のデータをインデックスとして、前記レ
コードに基づいて、前記抽出した先頭文字列に該当する
前記分割辞書を検索し、前記入力された文字列のうち、
前記抽出した先頭文字列を除いた残りの文字データを比
較対象として前記分割辞書のデータと比較することによ
り、前記入力された文字列を前記分割辞書の見出し語か
ら検索するものである。To achieve the above object, according to an aspect of dictionary data search method according to the present invention is a dictionary data retrieval method for retrieving input character string from the entry word dictionary is the input Index from the character string data
First character is extracted as a character string, and the first character string is made common.
Consists of multiple headwords, the number of characters in the first character string
To limit the amount of data to within the specified tolerance
Split dictionary and the first character string using the first character string as an index
And the data of the divided dictionary corresponding to the first character string.
Using a plurality of records associated with data, the extracted leading character string data is used as an index, and based on the record, the divided dictionary corresponding to the extracted leading character string is searched, and the input dictionary is searched. Out of the string
The input character string is searched for from the headword of the divided dictionary by comparing the remaining character data excluding the extracted leading character string with the data of the divided dictionary as a comparison target.

【００２７】また本発明に係る辞書データ検索装置は、
入力装置と、辞書記憶部と、インデックス記憶部と、イ
ンデックス検索手段と、辞書検索手段とを有し、入力さ
れた文字列を辞書の見出し語から検索する辞書データ検
索装置であって、前記入力装置は、検索対象の文字列を
入力するものであり、前記辞書記憶部は、先頭文字列を
共通とした複数の見出し語で構成されており、先頭文字
列の文字数を変えてデータ量を所定許容範囲内に制限し
た複数の分割辞書を記憶するものであり、前記インデッ
クス記憶部は、前記先頭文字列のデータをインデックス
として先頭文字列のデータと該先頭文字列に対応する分
割辞書のデータとを関連付けたレコードを記憶するもの
であり、前記インデックス検索手段は、前記入力された
文字列のデータから前記インデックスとしての先頭文字
列を抽出し、前記抽出した先頭文字列のデータをインデ
ックスとして、前記レコードに基づいて、前記抽出した
先頭文字列に該当する分割辞書のデータを検索するもの
であり、前記辞書検索手段は、前記インデックス検索手
段が検索した分割辞書を参照して、前記入力された文字
列のうち、前記抽出した先頭文字列を除いた残りの文字
データを比較対象として前記分割辞書のデータと比較す
ることにより、前記入力された文字列を前記分割辞書の
見出し語から検索するものである。Further, the dictionary data search device according to the present invention
An input device, a dictionary storage unit, an index storage unit, an index search unit, and a dictionary search unit, a dictionary data search device for searching an input character string from a dictionary entry word, wherein the input data device is used to input a character string to be searched, the dictionary storage section is composed of a plurality of headword the first character string and the common first character
Change the number of characters in a column to limit the amount of data to
The index storage unit stores a record in which the data of the first character string is associated with the data of the divided dictionary corresponding to the first character string using the data of the first character string as an index. The index search means extracts a leading character string as the index from the input character string data, and uses the extracted leading character string data as an index, based on the record, Searching the data of the divided dictionary corresponding to the extracted first character string, wherein the dictionary search means refers to the divided dictionary searched by the index search means, and among the input character strings, By comparing the remaining character data excluding the extracted leading character string with the data in the divided dictionary as a comparison target, The input character string is to find the entry word of the split dictionary.

【００２８】また、本発明に係る検索用辞書の作成方法
は、入力された文字列を比較する複数の見出し語から構
成された検索用辞書を作成する検索用辞書の作成方法で
あって、前記検索用辞書は、複数の分割辞書からなり、
前記各分割辞書は、先頭文字列を共通とした複数の見出
し語で構成されるものであり、基礎となる単体の辞書
を、共通する先頭ｎ（但し、ｎは整数）文字を単位とし
て、複数の分割辞書に分割し、次に、前記各分割辞書に
含まれるデータ量が辞書データの許容範囲内であるか否
かを検算し、前記各分割辞書のデータ量が辞書データの
許容範囲を越えていた場合に、前記分割辞書のうち、デ
ータ量が最も多い分割辞書を検出し、さらに、前記検出
されたデータ量が最も多い分割辞書を、共通する先頭
（ｎ＋１）文字を単位として、複数の分割辞書に再分割
し、再び、前記再分割した分割辞書に含まれるデータ量
が辞書データの許容範囲を越えている場合に、前記再分
割した分割辞書のうち、データ量が最も多い分割辞書
を、先頭文字の文字数を変更して再分割し、全ての分割
辞書のデータ量を辞書データの許容範囲内に調整して、
分割辞書を作成するものである。[0028] Further, a method of creating a search dictionary according to the present invention is a method of creating a search dictionary composed of a plurality of headwords for comparing input character strings. The search dictionary consists of a plurality of split dictionaries,
Each of the divided dictionaries is composed of a plurality of headwords having a common leading character string, and a plurality of basic dictionaries are defined in units of a common leading n (where n is an integer) characters. Then, whether or not the data amount included in each of the divided dictionaries is within the allowable range of the dictionary data is checked, and the data amount of each of the divided dictionaries exceeds the allowable range of the dictionary data. In this case, the divided dictionary having the largest data amount is detected from among the divided dictionaries, and the divided dictionary having the largest detected data amount is further divided into a plurality of units by using a common first (n + 1) character as a unit. If the amount of data included in the divided dictionary exceeds the permissible range of the dictionary data, the divided dictionary having the largest data amount is re-divided. , The number of first characters Further subdivided to, by adjusting the amount of data of all the divided dictionaries within the allowable range of the dictionary data,
This is for creating a divided dictionary.

【００２９】また本発明に係るインデックスの作成方法
は、先頭文字列を共通とした複数の見出し語で構成され
ている分割辞書と前記先頭文字列とを関連付けるインデ
ックスの作成方法であって、基礎となる辞書を、共通す
る先頭ｎ（但し、ｎは整数）文字を単位として複数の分
割辞書に分割し、前記先頭ｎ文字と、前記先頭ｎ文字に
対応する前記分割辞書とを関連付け、次に、前記各分割
辞書に含まれるデータ量が辞書データの許容範囲内であ
るか否かを検算し、前記各分割辞書のデータ量が辞書デ
ータの許容範囲を越えていた場合に、前記分割辞書のう
ち、データ量が最も多い分割辞書を検出し、さらに、前
記検出されたデータ量が最も多い分割辞書を、共通する
先頭（ｎ＋１）文字を単位として、複数の分割辞書に再
分割し、前記先頭（ｎ＋１）文字と、前記先頭（ｎ＋
１）文字に対応する前記分割辞書とを関連付けることに
より、前記分割辞書と前記先頭文字列とを関連付けるイ
ンデックスを作成するものである。An index creation method according to the present invention is a method for creating an index for associating a divided dictionary composed of a plurality of headwords having a common leading character string with the leading character string. Is divided into a plurality of divided dictionaries in units of a common first n (where n is an integer) characters, and the first n characters are associated with the divided dictionaries corresponding to the first n characters. Check whether the data amount included in each of the divided dictionaries is within the permissible range of the dictionary data, and when the data amount of each of the divided dictionaries exceeds the permissible range of the dictionary data, Detecting the divided dictionary having the largest data amount, and further dividing the divided dictionary having the largest detected data amount into a plurality of divided dictionaries in units of a common leading (n + 1) character. n + 1) characters and, the leading (n +
1) An index for associating the divided dictionary with the leading character string is created by associating the divided dictionary with a corresponding character.

【００３０】また、再分割される分割辞書の数に伴って
増加するインデックスのデータ量を分割して管理するも
のである。Further, the data amount of the index, which increases with the number of divided dictionaries to be re-divided, is divided and managed.

【００３１】[0031]

【発明の実施の形態】以下、本発明の実施の形態を図に
より説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below with reference to the drawings.

【００３２】（実施形態１）図１は、本発明の実施形態
１に係る辞書データ検索装置を示す構成図である。(Embodiment 1) FIG. 1 is a configuration diagram showing a dictionary data search apparatus according to Embodiment 1 of the present invention.

【００３３】図１において本発明の実施形態１に係る辞
書データ検索装置は、入力装置１と、情報記憶装置３
と、データ処理装置２とを有しており、入力された文字
列を辞書の見出し語から検索するようになっている。In FIG. 1, a dictionary data search device according to the first embodiment of the present invention includes an input device 1 and an information storage device 3.
And a data processing device 2 for retrieving an input character string from a dictionary entry word.

【００３４】入力装置１は、検索対象の文字列を入力す
るものであり、キーボード等が用いられる。The input device 1 is for inputting a character string to be searched, and a keyboard or the like is used.

【００３５】情報記憶装置３は、辞書記憶部３２と、イ
ンデックス記憶部３１とを含んでいる。辞書記憶部３２
は図２に示すように、先頭文字列を共通とした複数の見
出し語で構成されている複数の分割辞書２００，２０
１，２０２，２０３，２０４・・・を記憶するようにな
っている。The information storage device 3 includes a dictionary storage unit 32 and an index storage unit 31. Dictionary storage unit 32
Are, as shown in FIG. 2, a plurality of divided dictionaries 200 and 20 each composed of a plurality of headwords having a common leading character string.
1, 202, 203, 204,... Are stored.

【００３６】図２に示す辞書記憶部３２では、先頭文字
列を共通とした見出し語、例えば、先頭文字列がansで
ある場合には、見出し語answer,answerable・・・で分
割辞書２００を構成し、先頭文字列ant@である場合に
は、見出し語antで分割辞書２０１を構成している。先
頭文字列ant@の@はantに続く文字がないことを意味して
おり、@以外の記号文字を使ってもよい。In the dictionary storage unit 32 shown in FIG. 2, if the headword is a common headword, for example, if the head character string is ans, the divided dictionary 200 is composed of the headword answer, answerable. If the leading character string is ant @, the divided dictionary 201 is composed of the headword ant. The @ in the leading character string ant @ means that there is no character following ant, and a symbol character other than @ may be used.

【００３７】さらに図２に示すように、先頭文字列anta
である場合には、見出し語antacid,antagonism,antarct
icで分割辞書２０２を構成しており、同様に、先頭文字
列anteである場合には分割辞書２０３を、先頭文字列an
thである場合には分割辞書２０４をそれぞれ構成してい
る。Further, as shown in FIG.
, The headword antacid, antagonism, antarct
ic, the divided dictionary 202 is formed. Similarly, when the leading character string is ante, the divided dictionary 203 is replaced with the leading character string an.
If it is th, each of the divided dictionaries 204 is configured.

【００３８】インデックス記憶部３１は図２に示すよう
に、先頭文字列のデータ（例えば、ans,ant@,anta・・
・）をインデックスとして、先頭文字列のデータ（例え
ば、ans,ant@,anta・・・）と、先頭文字列（例えば、a
ns,ant@,anta・・・）に対応する分割辞書２００，２０
１，２０２，２０３，２０４・・・のデータとを関連付
けたレコード１００，１０１，１０２，１０３，１０４
・・・を記憶するようになっている。As shown in FIG. 2, the index storage unit 31 stores data of the first character string (for example, ans, ant @, anta,...).
) As an index, the data of the first character string (for example, ans, ant @, anta ...) and the first character string (for example, a
ns, ant @, anta ...)
Records 100, 101, 102, 103, 104 in which data 1, 202, 203, 204,.
.. Are stored.

【００３９】データ処理装置２は、インデックス検索手
段２１と、辞書検索手段２２とを含んでいる。The data processing device 2 includes an index search unit 21 and a dictionary search unit 22.

【００４０】インデックス検索手段２１は、入力装置１
から入力された文字列のデータからインデックスとして
の先頭文字列（例えば、ans,ant@,anta・・・）を抽出
し、抽出した先頭文字列（例えば、ans,ant@,anta・・
・）のデータをインデックスとして、レコード１００，
１０１，１０２，１０３，１０４・・・に基づいて、抽
出した先頭文字列（例えば、ans,ant@,anta・・・）に
該当する分割辞書２００，２０１，２０２，２０２，２
０３，２０４・・・のデータを検索するようになってい
る。The index retrieving means 21 includes the input device 1
Extracts a leading character string (for example, ans, ant @, anta ...) as an index from the data of the character string input from, and extracts the leading character string (for example, ans, ant @, anta ...)
Record 100, using the data of
Based on 101, 102, 103, 104,..., The divided dictionaries 200, 201, 202, 202, 2 corresponding to the extracted leading character strings (for example, ans, ant @, anta,.
.. Are searched.

【００４１】辞書検索手段２２は、インデックス検索手
段２１が検索した各分割辞書２００，２０１，２０２，
２０３，２０４・・・を参照して、入力装置１から入力
された文字列のうち、抽出した先頭文字列（例えば、an
s,ant@,anta・・・）を除いた残りの文字データを比較
対象として分割辞書２００，２０１，２０２，２０３，
２０４・・・のデータと比較することにより、入力装置
１から入力された文字列を分割辞書２００，２０１，２
０２，２０３，２０４・・・の見出し語から検索するよ
うになっている。The dictionary search means 22 includes the divided dictionaries 200, 201, 202,
With reference to 203, 204,..., The extracted leading character string (for example, an
s, ant @, anta...), and the divided dictionaries 200, 201, 202, 203,
By comparing the character string input from the input device 1 with the divided dictionaries 200, 201,
.. Are searched from the headwords 02, 203, 204.

【００４２】したがって、図１に示す本発明の実施形態
１に係る辞書データ検索装置によれば、インデックス検
索手段２１は、入力装置１から入力された文字列と、イ
ンデックス記憶部３１に記憶されたレコード１００，１
０１，１０２，１０３，１０４・・・の文字列（例え
ば、ans,ant@,anta・・・）とを比較することにより、
入力された文字列が該当する分割辞書２００，２０１，
２０２，２０３，２０４・・・があるか否かを調べる。Therefore, according to the dictionary data search device according to the first embodiment of the present invention shown in FIG. 1, the index search means 21 stores the character string input from the input device 1 and the character string stored in the index storage unit 31. Record 100, 1
By comparing a character string of 01, 102, 103, 104... (For example, ans, ant @, anta.
The divided dictionaries 200, 201,
It is checked whether there are 202, 203, 204,...

【００４３】辞書検索手段２２は、インデックス検索手
段２１がレコード１００，１０１，１０２，１０３，１
０４・・・に基づいて見つけた分割辞書に相当する分割
辞書２００，２０１，２０２，２０３，２０４・・・を
辞書記憶部３２より取り出し、入力された文字列を検索
する。The dictionary search means 22 determines that the index search means 21 has records 100, 101, 102, 103, 1
.. Corresponding to the divided dictionaries found on the basis of No. 04... Are retrieved from the dictionary storage unit 32 and the input character strings are searched.

【００４４】例えば、入力文字列として"antarctic"が
入力装置１から入力された場合、インデックス検索手段
２１は、先頭文字列"anta"を抽出したとする。For example, when "antarctic" is input from the input device 1 as an input character string, it is assumed that the index search means 21 extracts the head character string "anta".

【００４５】辞書検索手段２２は、先頭文字列が"anta"
である分割辞書２０２を辞書記憶部３２より取り出す。
取り出した分割辞書２０２は、先頭文字列が"anta"で始
まる見出し語の集合であるため、検索したい文字列"ant
arctic"の先頭４文字antaは、比較する必要がなくな
る。The dictionary search means 22 determines that the first character string is "anta"
Is extracted from the dictionary storage unit 32.
Since the extracted divided dictionary 202 is a set of headwords whose head character string starts with "anta", the character string "ant"
The first four characters anta of "arctic" need not be compared.

【００４６】したがって、本発明の実施形態によれば、
入力された文字列と辞書の文字列との比較文字数を少な
くすることができるため、データ検索を高速で処理する
ことが可能となる。Therefore, according to the embodiment of the present invention,
Since the number of comparison characters between the input character string and the character string in the dictionary can be reduced, the data search can be performed at high speed.

【００４７】次に、本発明の実施形態において、検索用
辞書及びインデックスを作成する方法について説明す
る。Next, a method of creating a search dictionary and an index in the embodiment of the present invention will be described.

【００４８】本発明の実施形態において検索用辞書（複
数の分割辞書２００，２０１，２０２，２０３，２０４
・・・）を作成するには図３に示すように、基礎となる
単体の辞書５を用いる。In the embodiment of the present invention, a search dictionary (a plurality of divided dictionaries 200, 201, 202, 203, 204)
..) Is created using a single dictionary 5 as a base, as shown in FIG.

【００４９】基礎となる単体の辞書５は、ａ〜ｚまでの
単一文字を単位としてａ辞書からｚ辞書まで分類してい
る（図５のステップＳ１）。The basic single dictionary 5 is classified from the a dictionary to the z dictionary in units of single characters a to z (step S1 in FIG. 5).

【００５０】本発明の実施形態において検索用辞書（複
数の分割辞書２００，２０１，２０２，２０３，２０４
・・・）を作成するには図３に示すように、まず基礎と
なる単体の辞書５を用い、例えば辞書５のうちａ辞書５
０１に共通する先頭ｎ（但し、ｎは整数）文字、例えば
先頭２文字ａ＠，ａａ，・・・ａｎ，・・・を単位として、複
数の分割辞書例えばａ＠辞書，ａａ辞書，・・・ａｎ辞
書，・・・に分割する。ａ＠辞書，ａａ辞書，・・・ａｎ辞
書，・・・等を含む辞書が第１回目に分割した辞書６とな
る。In the embodiment of the present invention, a search dictionary (a plurality of divided dictionaries 200, 201, 202, 203, 204)
..), First, as shown in FIG. 3, a basic single dictionary 5 is used.
01, a plurality of divided dictionaries, for example, a ， dictionary, aa dictionary,..., In units of leading n (where n is an integer) characters, for example, leading two characters a 文字, aa,. -Divide into an dictionary, ... The dictionary including the a ＠ dictionary, the aa dictionary,... an dictionary,.

【００５１】次に、分割した各分割辞書（例えばａ＠辞
書，ａａ辞書，・・・ａｎ辞書，・・・）６に含まれるデータ
量が辞書データの許容範囲内であるか否かを検算し（図
５のステップＳ３）、分割した各分割辞書のデータ量が
辞書データの許容範囲を越えていた場合に、分割辞書６
のうち、データ量が最も多い分割辞書、例えば分割辞書
であるａｎ辞書６０１を検出し、その検出されたデータ
量が最も多い分割辞書６０１を、共通する先頭（ｎ＋
１）文字、例えば先頭文字３文字ａｎａ，・・・ａｎｔ，・
・・ａｎｚを単位として、複数の分割辞書７、例えばａｎ
ａ辞書，・・・ａｎｔ辞書，・・・ａｎｚ辞書に再分割し（図
５のステップＳ４）、再び、再分割した分割辞書（例え
ばａｎａ辞書，・・・ａｎｔ辞書，・・・ａｎｚ辞書）７に含
まれるデータ量が辞書データの許容範囲を越えている場
合に、再分割した分割辞書のうち、データ量が最も多い
分割辞書、例えば分割辞書であるａｎｔ辞書７０１を、
先頭文字の文字数を例えば先頭文字４文字に変更して再
分割し、先頭４文字、例えばａｎｔ＠，ａｎｔａ，・・・
を単位とした分割辞書（例えばａｎｔ＠辞書，ａｎｔａ
辞書，・・・）８に分割する。Next, it is checked whether or not the amount of data included in each of the divided dictionaries (eg, a ＠ dictionary, aa dictionary,... An dictionary,...) 6 is within the allowable range of the dictionary data. (Step S3 in FIG. 5), when the data amount of each divided dictionary exceeds the allowable range of the dictionary data, the divided dictionary 6
Of the divided dictionaries having the largest data amount, for example, the an dictionary 601 which is the divided dictionary is detected, and the divided dictionaries 601 having the largest detected data amount are assigned to the common head (n +
1) Characters, for example, the first three characters ana,... Ant,.
.. a plurality of divided dictionaries 7 in units of anz, eg, an
a dictionary,... ant dictionary,... anz dictionary (step S4 in FIG. 5), and again the divided dictionary (eg, anana dictionary,... ant dictionary,. 7 is larger than the allowable range of the dictionary data, the divided dictionary having the largest data amount among the divided dictionaries, for example, the ant dictionary 701 which is a divided dictionary is
The number of characters of the first character is changed to, for example, the first four characters and re-divided.
Dictionary (for example, ant @ dictionary, anta dictionary)
Dictionary,...).

【００５２】以上の処理を繰り返し行なって、全ての分
割辞書のデータ量を辞書データの許容範囲内に調整し
て、分割辞書を作成する。By repeating the above processing, the data amount of all the divided dictionaries is adjusted to be within the allowable range of the dictionary data, and the divided dictionaries are created.

【００５３】次に、本発明の実施形態に係るインデック
スの作成方法について説明する。本発明の実施形態に係
るインデックスの作成方法は、先頭文字列を共通とした
複数の見出し語で構成されている分割辞書と前記先頭文
字列とを関連付けることを基本構成とするものである。Next, a method of creating an index according to the embodiment of the present invention will be described. An index creation method according to an embodiment of the present invention has a basic configuration in which a divided dictionary composed of a plurality of headwords having a common leading character string is associated with the leading character string.

【００５４】具体的に説明する。すなわち、図３に示す
基礎となる単体の辞書５は、ａ〜ｚまでの単一文字を単
位としてａ辞書からｚ辞書まで分類されており、図４に
示すようにインデックス９の先頭１文字（ａ，ｂ，・・・
ｚ）は、辞書５のａ辞書〜ｚ辞書までの各々に対応して
おり、インデックス９の例えば先頭１文字ａのインデッ
クス９０１は、辞書５のａ辞書５０１に関連付けされて
いる（図５のステップＳ２）。A specific description will be given. That is, the basic single dictionary 5 shown in FIG. 3 is classified from the a dictionary to the z dictionary in units of single characters a to z, and as shown in FIG. , B, ...
z) corresponds to each of the a dictionary to the z dictionary of the dictionary 5. For example, the index 901 of the first character a of the index 9 is associated with the a dictionary 501 of the dictionary 5 (step in FIG. 5). S2).

【００５５】次に、本発明の実施形態において検索用辞
書（複数の分割辞書２００，２０１，２０２，２０３，
２０４・・・）を作成するには図３に示すように、共通
する先頭ｎ（但し、ｎは整数）文字、例えば先頭２文字
ａ＠，ａａ，・・・ａｎ，・・・を単位として複数の分割辞書
（例えばａ＠辞書，ａａ辞書，・・・ａｎ辞書，・・・）６に
分割するが、その際、図４に示すようにインデックス１
０のうち、例えば先頭２文字ａｎに相当するレコード１
００１と複数の分割辞書６の例えばａｎ辞書６０１との
関連付けを行なう。この場合、分割辞書６のａ＠辞書，
・・・ｂ辞書，・・・ｚ辞書とインデックス１０の各ａ＠，・・
・ｂ，・・・ｚのレコードとの関連付けをも行なう。Next, in the embodiment of the present invention, search dictionaries (a plurality of divided dictionaries 200, 201, 202, 203,
As shown in FIG. 3, a common first n (where n is an integer) characters, for example, the first two characters a ＠, aa,... An,. The dictionary is divided into a plurality of divided dictionaries (for example, a dictionary, aa dictionary,... An dictionary,...), And at this time, as shown in FIG.
0, for example, record 1 corresponding to the first two characters an
001 and an an dictionary 601 of the plurality of divided dictionaries 6 are associated with each other. In this case, a ＠ dictionary of the divided dictionary 6,
.. B dictionary,... Each of the z dictionary and index 10,.
.., Z are also associated with records.

【００５６】また、上述したように、分割した各分割辞
書のデータ量が辞書データの許容範囲を越えていた場合
に、分割辞書６のうち、データ量が最も多い分割辞書、
例えば分割辞書であるａｎ辞書６０１を検出し、その検
出されたデータ量が最も多い分割辞書６０１を、共通す
る先頭（ｎ＋１）文字、例えば先頭文字３文字ａｎａ，
・・・ａｎｔ，・・・ａｎｚを単位として、複数の分割辞書、
例えばａｎａ辞書，・・・ａｎｔ辞書，・・・ａｎｚ辞書に再
分割し、再び、再分割した分割辞書（例えばａｎａ辞
書，・・・ａｎｔ辞書，・・・ａｎｚ辞書）７に含まれるデー
タ量が辞書データの許容範囲を越えている場合に、再分
割した分割辞書のうち、データ量が最も多い分割辞書、
例えば分割辞書であるａｎｔ辞書７０１を、先頭文字の
文字数を例えば先頭文字４文字に変更して再分割し、先
頭４文字、例えばａｎｔ＠，ａｎｔａ，・・・を単位とし
た分割辞書（例えばａｎｔ＠辞書，ａｎｔａ辞書，・・
・）８に分割する作業が行なわれるが、この工程におい
ても、図３及び図４に示すように分割辞書７，８とイン
デックス１１，１２のレコードとの関連付けを行なう
（図５のステップＳ５）。As described above, when the data amount of each divided dictionary exceeds the allowable range of the dictionary data, the divided dictionary having the largest data amount among the divided dictionaries 6 is used.
For example, the an dictionary 601 which is a divided dictionary is detected, and the divided dictionary 601 having the largest detected data amount is identified as a common first (n + 1) character, for example, the first three characters ana,
... Ant,.
For example, the amount of data included in the divided dictionary (eg, ana dictionary,... Ant dictionary,. Is larger than the allowable range of the dictionary data, the divided dictionary with the largest data amount among the re-divided divided dictionaries,
For example, the ant dictionary 701, which is a divided dictionary, is re-divided by changing the number of leading characters to, for example, the first four characters, and divided into units of leading four characters, for example, ant @, anta,. ＠Dictionary, anta dictionary, ...
The work of dividing into 8) is performed. In this process, the division dictionaries 7 and 8 are associated with the records of the indexes 11 and 12 as shown in FIGS. 3 and 4 (step S5 in FIG. 5). .

【００５７】図４に示す例では、分割辞書７の先頭３文
字ａｎｔを単位とした分割辞書（ａｎｔ辞書）を、イン
デックス１１のうち先頭３文字ａｎｔに相当するレコー
ド１１０１に関連付けを行なっている。この場合、説明
を省略した分割辞書５，６，７，８の残りの分割辞書と
インデックス９，１０，１１，１２の各レコードとの関
連付けをも行なう。In the example shown in FIG. 4, the divided dictionary (ant dictionary) in which the first three characters ant of the divided dictionary 7 are used as a unit is associated with the record 1101 corresponding to the first three characters ant in the index 11. In this case, the remaining divided dictionaries of the divided dictionaries 5, 6, 7, and 8 whose description is omitted are associated with the respective records of the indexes 9, 10, 11, and 12.

【００５８】したがって、図２に示すように、例えば、
インデックス記憶部３１のレコード１００は、"answe
r"、"answerable" という共通した先頭文字列 "ans" を
持つ分割辞書２００を示している。Therefore, for example, as shown in FIG.
The record 100 of the index storage unit 31 is “answe
A divided dictionary 200 having a common leading character string "ans" of "r" and "answerable" is shown.

【００５９】図２において、インデックス記憶部３１の
レコード１０１内の先頭文字列中に記載されている "
＠" は、スペースまたは空欄を意味する記号で、"＠"
である必要はないが、文字以外の記号で表す。これは、
レコード１０２からレコード１０４が先頭文字列４文字
（"ant" ＋１文字）で分割辞書２０２から２０４を特
定しているのに対して、レコード１０１が先頭文字列３
文字（"ant"）で分割辞書２０１を特定してしまうと、
例えば、"antacid" という入力文字列を検索するとき
に、レコード１０１とレコード１０２の両方に入力文字
列と共通な先頭文字列が存在してしまうということを避
けるためである。In FIG. 2, the first character string in the record 101 of the index storage unit 31 is "
"＠" is a symbol meaning space or blank space, "、"
Is not necessary, but is represented by a symbol other than a character. this is,
While the records 102 to 104 specify the split dictionaries 202 to 204 with the first character string of 4 characters (“ant” +1 character), the record 101 has the first character string 3
If the division dictionary 201 is specified by characters ("ant"),
For example, when searching for an input character string of “antacid”, it is to avoid that a leading character string common to the input character string exists in both the record 101 and the record 102.

【００６０】次に、図１，図６及び図７を参照して、本
発明の実施形態における検索動作について説明する。Next, a search operation in the embodiment of the present invention will be described with reference to FIGS.

【００６１】入力装置１から与えられた入力文字列は、
インデックス検索手段２１に供給される。The input character string given from the input device 1 is
It is supplied to the index search means 21.

【００６２】インデックス検索手段２１は、インデック
ス記憶部３１よりインデックスの１レコードより先頭文
字列を１つ取り出す（ステップＳ３１）。The index search means 21 retrieves one leading character string from one record of the index from the index storage unit 31 (step S31).

【００６３】ステップＳ３２において、先頭文字列があ
ったかどうかを判別し、先頭文字列がなかった場合は、
入力文字列は辞書にないとし（ステップＳ３８）、イン
デックス検索手段の処理を終了する。In step S32, it is determined whether or not there is a leading character string.
It is assumed that the input character string is not in the dictionary (step S38), and the processing of the index search means ends.

【００６４】この場合、辞書検索手段２２は実行せず、
入力文字列は辞書にない旨のメッセージが出力装置４に
表示される。In this case, the dictionary search means 22 is not executed,
A message that the input character string is not in the dictionary is displayed on the output device 4.

【００６５】先頭文字列が有った場合は、先頭文字列の
文字数を比較文字数とし（ステップＳ３３）、入力文字
列より比較文字列を取り出し（ステップＳ３４）、先頭
文字列と比較文字列を比較する（ステップＳ３５）。If there is a head character string, the number of characters in the head character string is set as the number of comparison characters (step S33), a comparison character string is extracted from the input character string (step S34), and the head character string and the comparison character string are compared. (Step S35).

【００６６】比較の結果、文字列が同じでなければ、次
のインデックスのレコードを取り出し（ステップＳ３
６）、ステップＳ３１へ進む。As a result of the comparison, if the character strings are not the same, the record of the next index is extracted (step S3).
6), proceed to step S31.

【００６７】文字列が同じであれば、インデックスで示
されている辞書を取り出し（ステップ３７）、インデッ
クス検索手段の処理を終了する。If the character strings are the same, the dictionary indicated by the index is extracted (step 37), and the processing of the index search means is terminated.

【００６８】次に、辞書検索手段２２は、入力文字列と
インデックスの先頭文字列を受け取り、比較開始位置を
算出する（ステップＳ３９）。Next, the dictionary search means 22 receives the input character string and the first character string of the index, and calculates a comparison start position (step S39).

【００６９】ステップＳ３７において、取り出された辞
書から見出し語を取り出し（ステップＳ４０）、比較開
始位置より入力文字列と辞書の見出し語を１文字ずつ比
較する（ステップＳ４１）。In step S37, an entry word is extracted from the extracted dictionary (step S40), and the input character string and the entry word in the dictionary are compared one by one from the comparison start position (step S41).

【００７０】入力文字列と辞書の見出し語の比較の結果
（ステップＳ４２）、同じ文字列ならば、辞書データを
取り出し（ステップＳ４４）処理を終了する。As a result of the comparison between the input character string and the dictionary entry (step S42), if the character string is the same, the dictionary data is extracted (step S44), and the process is terminated.

【００７１】辞書の見出し語の方が大きければ、入力文
字列は辞書にないとし（ステップＳ４６）処理を終了す
る。If the entry word in the dictionary is larger, it is determined that the input character string is not in the dictionary (step S46), and the process is terminated.

【００７２】入力文字列の方が大きければ、取り出され
た分割辞書にまだ辞書データがあるかどうかを調べ（ス
テップＳ４３）、辞書データがなくなったらステップＳ
４６に進む。辞書データがまだあれば、次のデータへ移
動し（ステップＳ４５）、ステップＳ４０に進む。If the input character string is larger, it is checked whether or not the extracted divided dictionary still has dictionary data (step S43).
Proceed to 46. If dictionary data still exists, the process moves to the next data (step S45) and proceeds to step S40.

【００７３】つまり、入力文字列を見つけるか、辞書デ
ータが終了するか、辞書の見出し語の方が大きくなるま
でステップＳ３９で算出した比較開始位置より比較検査
を繰り返す。That is, the comparison check is repeated from the comparison start position calculated in step S39 until the input character string is found, the dictionary data ends, or the dictionary entry becomes larger.

【００７４】次に、具体例を用いて本発明の実施形態の
動作を説明する。図２に示すように、例えば、インデッ
クス記憶部３１には"anta"という先頭文字列をもつレコ
ード１０２があり、辞書記憶部３２には、見出し語が"a
nta"で始まる分割辞書２０２が存在していると仮定す
る。Next, the operation of the embodiment of the present invention will be described using a specific example. As shown in FIG. 2, for example, the index storage unit 31 has a record 102 having the first character string “anta”, and the dictionary storage unit 32 has a headword “a”.
Assume that there is a split dictionary 202 beginning with "nta".

【００７５】今、入力文字列"antarctic"が与えられた
とする。Assume that the input character string "antarctic" is given.

【００７６】インデックス検索手段２１は、インデック
ス記憶部３１よりインデックスの先頭文字列"anta"を取
り出すと（ステップＳ３１）、比較文字数に文字数４を
格納し（ステップＳ３３）、比較文字列に入力文字列"a
ntarctic"の先頭４文字"anta"を格納する。When the index search unit 21 retrieves the leading character string "anta" of the index from the index storage unit 31 (step S31), it stores the number of characters 4 as the number of comparative characters (step S33), and stores the input character string in the comparative character string. "a
The first four characters of "ntarctic" are stored.

【００７７】比較文字列"anta"とインデックスの文字
列"anta"を比較し（ステップＳ３５）、同じ文字列なの
でインデックスが示している辞書を取り出す（ステップ
Ｓ３７）。The comparison character string "anta" is compared with the character string "anta" of the index (step S35), and the dictionary indicated by the index is extracted because the character string is the same (step S37).

【００７８】辞書検索手段２２は、比較開始位置に開始
位置５を格納し（ステップＳ３９）、分割辞書２０２よ
り先頭の見出し語"antacid"を取り出す（ステップＳ４
０）。入力文字列"antarctic"の５文字目以降"rctic"と
分割辞書２０２より取り出した見出し語"antacid"の５
文字目以降"cid"を比較する（ステップＳ４１）。The dictionary search means 22 stores the start position 5 as the comparison start position (step S39), and extracts the headword "antacid" from the divided dictionary 202 (step S4).
0). The fifth and subsequent characters "rctic" of the input character string "antarctic" and the 5th of the headword "antacid" extracted from the divided dictionary 202
"Cid" is compared after the first character (step S41).

【００７９】入力文字列"rctic"と辞書の見出し語"cid"
を先頭１文字から順に比較の結果（ステップＳ４２）、
入力文字列の方が大きい、具体的には入力文字列の１文
字目"r"の方が辞書の見出し語の１文字目"c"よりも辞書
の並び上、後に位置するため、取り出している分割辞書
にまだ辞書データがあるか調べる（ステップＳ４３）。The input character string "rctic" and the dictionary entry word "cid"
Are compared in order from the first character (step S42),
The input character string is larger. Specifically, the first character "r" of the input character string is located after and after the first character "c" of the dictionary entry word. It is checked whether dictionary data still exists in the existing divided dictionary (step S43).

【００８０】まだ辞書データがあるので、次のデータ"a
ntagonism"の位置へ移動し（ステップＳ４５）、辞書よ
り見出し語"antagonism"を取り出す。Since there is still dictionary data, the next data "a
Move to the position of "ntagonism" (step S45), and extract the headword "antagonism" from the dictionary.

【００８１】入力文字列の５文字目以降"rctic"と辞書
の見出し語の５文字目以降"gonism"を比較する（ステッ
プＳ４１）。入力文字列の方が大きいので、次のデータ
へ移動し（ステップＳ４５）、辞書より見出し語"antar
ctic"を取り出す。A comparison is made between "rctic", which is the fifth and subsequent characters of the input character string, and "gonism", which is the fifth and subsequent characters of the dictionary entry word (step S41). Since the input character string is larger, the process moves to the next data (step S45), and the headword "antar" is read from the dictionary.
Take out the ctic ".

【００８２】入力文字列の５文字目以降"rctic"と辞書
の見出し語の５文字目以降"rctic"を比較する（ステッ
プＳ４１）。等しくなったので、辞書データを取り出し
（ステップＳ４４）、検索処理を終了し、入力文字列の
日本語訳などのデータを出力装置４に表示する。A comparison is made between "rctic" of the fifth and subsequent characters of the input character string and "rctic" of the fifth and subsequent characters of the dictionary entry (step S41). Since they have become equal, the dictionary data is extracted (step S44), the search processing is terminated, and data such as a Japanese translation of the input character string is displayed on the output device 4.

【００８３】なお、図１に示す本発明の実施形態では、
分割辞書の見出し語は、インデックスで示されている先
頭文字列も含んだ形で記憶していたが、各分割辞書で共
通する先頭文字列は削除した形で分割辞書の見出し語を
記憶するようにしてもよい。つまり、図２において、イ
ンデックスの先頭文字列"ans"で示される分割辞書の見
出し語は、"answer"の替わりに"wer"、"answerable"の
替わりに"werable"にする。In the embodiment of the present invention shown in FIG.
The headwords of the divided dictionaries are stored in the form that also includes the first character string indicated by the index.However, the headwords of the divided dictionaries are stored with the first character strings that are common to each divided dictionary deleted. It may be. That is, in FIG. 2, the headword of the divided dictionary indicated by the leading character string “ans” of the index is “wer” instead of “answer” and “werable” instead of “answerable”.

【００８４】このように辞書を変更した場合、図７に示
す処理手順は、ステップ３９を、“入力文字列を比較開
始位置からの文字列に変更”の処理内容に変更し、ステ
ップＳ４１を“データ比較”の処理内容に変更するだけ
で対処することが可能となる。このように処理内容を変
更することにより、分割辞書のサイズを更に削減するこ
とができるという利点を有する。When the dictionary is changed in this way, the processing procedure shown in FIG. 7 changes step 39 to "change input character string to character string from comparison start position" and changes step S41 to " This can be dealt with simply by changing the processing content of "data comparison". By changing the processing content in this way, there is an advantage that the size of the divided dictionary can be further reduced.

【００８５】以上説明したように本発明の実施形態によ
れば、分割辞書をインデックスの見出し語で始まる語の
集りにすることにより、分割辞書上の検索は分割辞書上
の見出し語からその分割辞書を示すインデックスの見出
し語の文字数分を差し引いた残りの文字列に対してのみ
比較検査をすればよくなり、辞書データとの検索文字数
を削減できる、したがってデータ検索の高速化を図るこ
とができる。As described above, according to the embodiment of the present invention, by making the divided dictionary a group of words starting with the index entry word, the search on the divided dictionary is performed from the entry word on the divided dictionary. The comparison test need only be performed on the remaining character strings obtained by subtracting the number of characters of the index word of the index indicating the index, and the number of search characters with the dictionary data can be reduced. Therefore, the speed of data search can be increased.

【００８６】さらに、各分割辞書の共通する先頭文字列
を削除した形で分割辞書の見出し語を記憶させることが
できるため、分割辞書のサイズを削減することができ、
メモリ使用量を削減するとができ、また、削減されたメ
モリを使って更に多くの辞書データを確保することがで
きる。Further, since the headwords of the divided dictionaries can be stored in a form in which the common leading character string of each of the divided dictionaries is deleted, the size of the divided dictionaries can be reduced.
The memory usage can be reduced, and more dictionary data can be secured using the reduced memory.

【００８７】（実施形態２）図８，図９及び図１０は、
本発明の実施形態２を示す図である。(Embodiment 2) FIG. 8, FIG. 9 and FIG.
It is a figure showing Embodiment 2 of the present invention.

【００８８】次に、本発明の実施形態２について図面を
参照して詳細に説明する。図８を参照すると、本発明の
実施形態２は、データ処理装置２が図１に示された実施
形態におけるデータ処理装置２の構成に加え、上位イン
デックス検索手段２３を有し、情報記憶装置３のインデ
ックス記憶部３１が図１に示された実施形態における情
報記憶装置３のインデックス記憶部３１を上位インデッ
クス記憶部３１１、分割インデックス記憶部３１２に細
分化した点で異なる。Next, a second embodiment of the present invention will be described in detail with reference to the drawings. Referring to FIG. 8, in a second embodiment of the present invention, the data processing device 2 includes a higher-order index search unit 23 in addition to the configuration of the data processing device 2 in the embodiment shown in FIG. 1 in that the index storage unit 31 of the information storage device 3 in the embodiment shown in FIG. 1 is subdivided into an upper-level index storage unit 311 and a divided index storage unit 312.

【００８９】上位インデックス検索手段２３は、入力装
置１から入力された入力文字列が分割インデックス記憶
部のどのインデックス記憶部に属する文字列かを検索す
るようになっている。The upper index search means 23 is adapted to search the input character string input from the input device 1 to which of the divided index storage units the character string belongs.

【００９０】図９を用いて、上位インデックス記憶部の
内部構造について説明する。上位インデックス記憶部３
１１は、分割インデックス記憶部３１２の先頭のレコー
ドの先頭文字列と分割インデックス記憶部３１２を表す
番号で構成されている。Referring to FIG. 9, the internal structure of the higher-order index storage unit will be described. Upper index storage unit 3
Reference numeral 11 denotes a head character string of the first record of the divided index storage unit 312 and a number indicating the divided index storage unit 312.

【００９１】分割インデックス記憶部３１２は、図１に
おけるインデックス記憶部３１と同じ構成で成り立って
いる。The divided index storage unit 312 has the same configuration as the index storage unit 31 in FIG.

【００９２】本発明に係る実施形態２の動作を図面を参
照して詳細に説明する。図１の実施形態では、分割辞書
が細分割化され、それに伴いインデックスが誇大なもの
になった場合の対処法が何も採られていなかった。The operation of the second embodiment according to the present invention will be described in detail with reference to the drawings. In the embodiment of FIG. 1, no measure is taken when the divided dictionary is subdivided and the index becomes exaggerated accordingly.

【００９３】本実施形態２では、このような場合、上位
インデックス検索手段２３にて上位インデックス記憶部
３１１より検索して、分割インデックス記憶部３１２を
特定する。In the second embodiment, in such a case, the upper index search unit 23 searches the upper index storage unit 311 to specify the divided index storage unit 312.

【００９４】インデックス検索手段２１は、特定された
分割インデックス記憶部３１２より分割辞書があるか否
かを調べる。この処理は、図１におけるインデックス検
索手段２１がインデックス記憶部３１より分割辞書があ
るか否かを調べる処理と同じである。また、辞書検索手
段２２も図１における辞書検索手段２２と同じ処理にな
る。The index search means 21 checks whether or not there is a divided dictionary from the specified divided index storage unit 312. This process is the same as the process in which the index search means 21 in FIG. Also, the dictionary search means 22 performs the same processing as the dictionary search means 22 in FIG.

【００９５】上位インデックス検索手段２３の処理につ
いて、図１０のフローチャートを用いて説明する。The processing of the upper index search means 23 will be described with reference to the flowchart of FIG.

【００９６】先ず、取り出す分割インデックスの番号
を、ａから始まるインデックスを取り出せるように格納
する（ステップＳ６１）。First, the number of the divided index to be taken out is stored so that the index starting from a can be taken out (step S61).

【００９７】次に、上位インデックス記憶部３１１より
分割インデックスの見出し語を１つ取り出し（ステップ
Ｓ６２）、入力装置１から入力された入力文字列と比較
をする（ステップＳ６３）。入力文字列の方が小さけれ
ば、今比較した辞書の直前が取り出す分割インデックス
記憶部３１２なので、取り出す分割インデックスの番号
を取得し（ステップＳ６７）、分割インデックスを取り
出す（ステップＳ６８）。Next, one entry word of the divided index is extracted from the upper index storage unit 311 (step S62), and is compared with the input character string input from the input device 1 (step S63). If the input character string is smaller, the divided index storage unit 312 immediately before the compared dictionary is to be extracted, so the number of the divided index to be extracted is obtained (step S67), and the divided index is extracted (step S68).

【００９８】ステップＳ６３で入力文字列の方が大きい
か、インデックスの見出し語と入力文字列が同じ綴りの
場合は、分割インデックスを表す番号を更新する（ステ
ップＳ６４）。そして、次のインデックスレコードへ移
動し（ステップＳ６５）、まだインデックスレコードが
あるか調べ（ステップＳ６６）、インデックスレコード
があればステップＳ６２に戻る。取り出すインデックス
レコードが見つかるか、インデックスレコードが無くな
るまで繰り返す。If the input character string is larger or the spelling of the index word and the input character string are the same in step S63, the number representing the divided index is updated (step S64). Then, it moves to the next index record (step S65), checks whether there is still an index record (step S66), and if there is an index record, returns to step S62. Repeat until an index record to retrieve is found or there are no more index records.

【００９９】以上のように本発明の実施形態２によれ
ば、辞書データの増大に伴いインデックスの見出し語が
増大し、インデックスサイズが許容範囲を超えてしまっ
た場合でも、インデックスを分割することにより、メモ
リの使用量を抑え、膨大なデータの検索をすることがで
きるという利点を有する。As described above, according to the second embodiment of the present invention, the index is divided even if the index headword increases as the dictionary data increases and the index size exceeds the allowable range. In addition, there is an advantage that a large amount of data can be searched by using a small amount of memory.

【０１００】[0100]

【発明の効果】以上のに本発明によれば、辞書データと
の検索文字数を削減できるため、データ検索の高速化が
図れる。その理由は、分割辞書をインデックスの見出し
語で始まる語の集りにすることで、分割辞書上の検索は
分割辞書上の見出し語からその分割辞書を示すインデッ
クスの見出し語の文字数分を差し引いた残りの文字列に
対してのみ比較検査をすればよくなるためである。As described above, according to the present invention, since the number of search characters with dictionary data can be reduced, the speed of data search can be increased. The reason is that the split dictionary is made up of words that start with the index words in the index, and searches on the split dictionary are obtained by subtracting the number of characters in the index words that indicate the split dictionary from the index words in the split dictionary. This is because it is only necessary to perform a comparison test on only the character string of the above.

【０１０１】さらに、分割辞書のサイズを削減できるた
め、メモリ使用量を削減することができ、また、削減さ
れたメモリを使って更に多くの辞書データを確保するこ
とができる。その理由は、各分割辞書の共通する先頭文
字列を削除した形で分割辞書の見出し語を記憶させるこ
とができるためである。Further, since the size of the divided dictionary can be reduced, the amount of memory used can be reduced, and more dictionary data can be secured using the reduced memory. The reason is that the headwords of the divided dictionaries can be stored in a form in which the common leading character string of each of the divided dictionaries is deleted.

【０１０２】さらに、本発明の比較検査は第１の効果の
理由に示したように、分割辞書上の見出し語からその分
割辞書を示すインデックスの見出し語の文字数分を差し
引いた残りの文字列で行なうため、各分割辞書の共通す
る先頭文字列を削除した形で分割辞書の見出し語を記憶
させても比較動作の処理速度は遅くならないものであ
る。Further, as shown in the reason of the first effect, the comparison test of the present invention uses the remaining character string obtained by subtracting the number of characters of the index word of the index indicating the divided dictionary from the index word on the divided dictionary. Therefore, the processing speed of the comparison operation is not reduced even if the headwords of the divided dictionaries are stored in such a manner that the common leading character strings of the respective divided dictionaries are deleted.

【０１０３】さらに、辞書データの増大に伴いインデッ
クスの見出し語が増大し、インデックスサイズが許容範
囲を超えてしまった場合でも、インデックスを分割する
ことにより、メモリの使用量を抑え、膨大なデータの検
索をすることができる。Further, even when the index entry words increase with the increase in the dictionary data and the index size exceeds the allowable range, the index is divided to reduce the amount of memory used and to store a large amount of data. You can search.

[Brief description of the drawings]

【図１】本発明の実施形態１に係る辞書データ検索装置
を示す構成図である。FIG. 1 is a configuration diagram showing a dictionary data search device according to a first embodiment of the present invention.

【図２】本発明の実施形態１におけるインデックス記憶
部と辞書記憶部とのデータ構造を示す構成図である。FIG. 2 is a configuration diagram illustrating a data structure of an index storage unit and a dictionary storage unit according to the first embodiment of the present invention.

【図３】本発明の実施形態１における辞書記憶部の作成
方法を示す遷移図である。FIG. 3 is a transition diagram illustrating a method for creating a dictionary storage unit according to the first embodiment of the present invention.

【図４】本発明の実施形態１におけるインデックス記憶
部の作成方法を示す遷移図である。FIG. 4 is a transition diagram illustrating a method of creating an index storage unit according to the first embodiment of the present invention.

【図５】本発明の実施形態１におけるインデックス記憶
部と辞書記憶部の作成方法を示すフローチャートであ
る。FIG. 5 is a flowchart illustrating a method of creating an index storage unit and a dictionary storage unit according to the first embodiment of the present invention.

【図６】本発明の実施形態１におけるインデックス検索
手段の動作を示すフローチャートである。FIG. 6 is a flowchart illustrating an operation of an index search unit according to the first embodiment of the present invention.

【図７】本発明の実施形態１における辞書検索手段の動
作を示すフローチャートである。FIG. 7 is a flowchart illustrating an operation of a dictionary search unit according to the first embodiment of the present invention.

【図８】本発明の実施形態２に係る辞書データ検索装置
を示す構成図である。FIG. 8 is a configuration diagram illustrating a dictionary data search device according to a second embodiment of the present invention.

【図９】本発明の実施形態２における上位インデックス
検索手段の内部構造を示す構成図である。FIG. 9 is a configuration diagram illustrating an internal structure of a higher-order index search unit according to the second embodiment of the present invention.

【図１０】本発明の実施形態２における上位インデック
ス検索手段の動作を示す構成図である。FIG. 10 is a configuration diagram illustrating an operation of a higher-order index search unit according to the second embodiment of the present invention.

【図１１】従来例におけるインデックスと辞書の構造を
示す構成図である。FIG. 11 is a configuration diagram showing a structure of an index and a dictionary in a conventional example.

【図１２】従来例の動作を示すフローチャートである。FIG. 12 is a flowchart showing the operation of the conventional example.

[Explanation of symbols]

１入力装置２データ処理装置２１インデックス検索手段２２辞書検索手段２３上位インデックス検索手段３情報記憶装置３１インデックス記憶部３１１上位インデックス記憶部３１２分割インデックス記憶部３２辞書記憶部４出力装置 DESCRIPTION OF SYMBOLS 1 Input device 2 Data processing device 21 Index search means 22 Dictionary search means 23 Upper index search means 3 Information storage device 31 Index storage unit 311 Upper index storage unit 312 Divided index storage unit 32 Dictionary storage unit 4 Output device

フロントページの続き (56)参考文献特開平５−54077（ＪＰ，Ａ) 特開平３−118661（ＪＰ，Ａ) 特開昭60−225273（ＪＰ，Ａ) 特開昭64−58018（ＪＰ，Ａ) 特開平５−334365（ＪＰ，Ａ) 特開昭59−136858（ＪＰ，Ａ) 特開平７−129576（ＪＰ，Ａ) 特開昭60−263236（ＪＰ，Ａ) 特開昭55−83962（ＪＰ，Ａ) 特開平５−266067（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 G06F 17/28 Continuation of front page (56) References JP-A-5-54077 (JP, A) JP-A-3-118661 (JP, A) JP-A-60-225273 (JP, A) JP-A-64-58018 (JP) JP-A-5-334365 (JP, A) JP-A-59-136858 (JP, A) JP-A-7-129576 (JP, A) JP-A-60-263236 (JP, A) 55-83962 (JP, A) JP-A-5-266067 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 17/30 G06F 17/28

Claims

(57) [Claims]

1. A dictionary data retrieval method for retrieving an input character string from a dictionary entry word, comprising:
Multiple headings that extract the first character and share the first character string
And the number of characters in the first character string is changed.
A plurality of divided dictionaries whose data amount is limited within a predetermined allowable range,
Using the first string as an index and the data of the first string
Relates to the data of the divided dictionary corresponding to the first character string
Using a plurality of attached records, using the data of the extracted leading character string as an index, based on the record, searching the divided dictionary corresponding to the extracted leading character string, the input character string And searching the input character string from a headword of the divided dictionary by comparing the extracted character string with the data of the divided dictionary with the remaining character data excluding the extracted leading character string as a comparison target. Dictionary data search method.

2. A dictionary data search device comprising an input device, a dictionary storage unit, an index storage unit, an index search unit, and a dictionary search unit, and searches for an input character string from a dictionary entry word. The input device is for inputting a character string to be searched, and the dictionary storage unit is configured with a plurality of headwords having a common first character string, and changes the number of characters in the first character string. De
The index storage unit stores a plurality of divided dictionaries in which the data amount is limited within a predetermined allowable range , and the index storage unit uses the data of the first character string as an index to correspond to the first character string data and the first character string. The index retrieval means extracts a leading character string as the index from the input character string data, and stores the data of the extracted leading character string. As an index, based on the record, to search for data of the divided dictionary corresponding to the extracted leading character string, wherein the dictionary search means refers to the divided dictionary searched by the index search means, In the input dictionary, the remaining character data excluding the extracted leading character string is compared with the divided dictionary. By comparing the data, dictionary data retrieval apparatus characterized by a character string the input is to search from the entry word of the split dictionary.

3. A search dictionary creating method for creating a search dictionary composed of a plurality of headwords for comparing input character strings, wherein the search dictionary comprises a plurality of divided dictionaries, Each of the divided dictionaries is composed of a plurality of headwords having a common head character string. A single dictionary serving as a base is divided into a plurality of common head n (where n is an integer) units. Next, whether or not the data amount included in each of the divided dictionaries is within the allowable range of the dictionary data is checked, and the data amount of each of the divided dictionaries exceeds the allowable range of the dictionary data. If
Among the divided dictionaries, a divided dictionary having the largest data amount is detected, and the divided dictionary having the largest detected data amount is further divided into a plurality of divided dictionaries in units of a common first (n + 1) character. Again, when the data amount included in the re-divided divided dictionary exceeds the permissible range of the dictionary data, the divided dictionary having the largest data amount among the re-divided divided dictionaries is changed to the number of characters of the first character. And generating a divided dictionary by adjusting the data size and re-dividing the data, adjusting the data amounts of all the divided dictionaries within the allowable range of the dictionary data.

4. A method for creating an index for associating a divided dictionary composed of a plurality of headwords having a common leading character string with the leading character string, wherein a base dictionary is provided with a common leading n ( Where n is an integer)
Divided into a plurality of divided dictionaries in units of characters,
Associating a character with the divided dictionaries corresponding to the first n characters, and then checking whether or not the amount of data included in each of the divided dictionaries is within the permissible range of the dictionary data. If the data volume exceeds the allowable range of dictionary data,
Among the divided dictionaries, a divided dictionary having the largest data amount is detected, and the divided dictionary having the largest detected data amount is further divided into a plurality of divided dictionaries in units of a common first (n + 1) character. and, said first (n + 1) character, by associating the said dividing dictionary corresponding to the top (n + 1) character, in that, characterized in that to create an index that associates the said dividing dictionary top string < How to make dex.

5. The index creation method according to claim 4, wherein the data amount of the index, which increases with the number of divided dictionaries to be re-divided, is divided and managed.