JPH096782A

JPH096782A - Document processing method and apparatus

Info

Publication number: JPH096782A
Application number: JP7148724A
Authority: JP
Inventors: Tsuyoshi Yagisawa; 津義八木沢; Michio Aizawa; 道雄相澤; Minoru Fujita; 稔藤田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1995-06-15
Filing date: 1995-06-15
Publication date: 1997-01-10

Abstract

(57)【要約】【目的】文章の校正処理を効率良く行なうことができ
る文書処理方法とその装置を提供する。【構成】所定の文から、少なくとも単語の表記を有す
る単語辞書の探索に基づいて、未知語を検出する未知語
検出工程（Ｓ１，Ｓ２）と、所定の文書に含まれる文字
列と前記未知語の相関値を計算する相関計算工程（Ｓ
４）と、前記計算された相関値の高い文字列を前記未知
語に対する訂正語候補として選択する選択工程（Ｓ５）
とを備える。 (57) [Abstract] [Purpose] To provide a document processing method and apparatus capable of efficiently performing proofreading processing of a sentence. An unknown word detecting step (S1, S2) of detecting an unknown word from a predetermined sentence based on a search of a word dictionary having at least word notation, a character string included in a predetermined document, and the unknown word Correlation calculation step (S) for calculating the correlation value of
4), and a selection step of selecting the character string having the calculated high correlation value as a correction word candidate for the unknown word (S5).
With.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書処理方法とその装
置、特に、文章中の誤りの可能性がある箇所を検出・指
摘し、文章の校正を行なうための文書処理方法とその装
置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document processing method and its apparatus, and more particularly to a document processing method and apparatus for detecting and pointing out a possible error in a sentence and proofreading the sentence. It is a thing.

【０００２】[0002]

【従来の技術】一般に、文章の校正を行なう文書処理装
置では、文章を解析し、その解析結果の中で、単語辞書
に無い語や解析不能な箇所等から得られる未知語部分
は、何らかの誤りの可能性があると想定し、指摘してい
た。2. Description of the Related Art Generally, in a document processing device for proofreading a sentence, the sentence is analyzed, and in the analysis result, an unknown word portion obtained from a word that is not in a word dictionary or a portion that cannot be analyzed is erroneous. I had pointed out that there was a possibility of.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
この種の装置では、単語辞書に無い語や解析不能箇所を
手がかりにするがゆえに、その未知語部分を誤りの可能
性があると指摘するだけで、どのような語句に訂正すれ
ばよいかの提示ができないという欠点があった。本発明
は、上記従来例に鑑みてなされたもので、文章の校正処
理を効率良く行なうことができる文書処理方法とその装
置を提供することを目的とする。However, in this type of conventional device, it is only pointed out that the unknown word portion may be erroneous because it uses a word that is not in the word dictionary or an unparseable portion as a clue. Then, there is a drawback that it is not possible to show what word should be corrected. The present invention has been made in view of the above-mentioned conventional example, and an object of the present invention is to provide a document processing method and an apparatus therefor capable of efficiently performing proofreading processing of a sentence.

【０００４】[0004]

【課題を解決するための手段】上記目的を達成するた
め、本発明の文書処理方法とその装置は以下の構成を備
える。即ち、所定の文から、少なくとも単語の表記を有
する単語辞書の探索に基づいて、未知語を検出する未知
語検出工程と、所定の文書に含まれる文字列と前記未知
語の相関値を計算する相関計算工程と、前記計算された
相関値の高い文字列を前記未知語に対する訂正語候補と
して選択する選択工程とを備える。In order to achieve the above object, the document processing method and apparatus of the present invention have the following configurations. That is, an unknown word detecting step of detecting an unknown word based on a search of a word dictionary having at least word notation from a predetermined sentence, and calculating a correlation value between the character string included in a predetermined document and the unknown word. A correlation calculation step and a selection step of selecting the character string having a high calculated correlation value as a correction word candidate for the unknown word are provided.

【０００５】また、別の発明は、所定の文から、少なく
とも単語の表記を有する単語辞書の探索に基づいて、未
知語を検出する未知語検出手段と、所定の文書に含まれ
る文字列と前記未知語の相関値を計算する相関計算手段
と、前記計算された相関値の高い文字列を前記未知語に
対する訂正語候補として選択する選択手段とを備える。Another aspect of the present invention is an unknown word detecting means for detecting an unknown word from a predetermined sentence based on a search of a word dictionary having at least word expressions, a character string included in a predetermined document, and A correlation calculation means for calculating a correlation value of an unknown word and a selection means for selecting a character string having a high calculated correlation value as a correction word candidate for the unknown word are provided.

【０００６】[0006]

【作用】以上の構成において、所定の文から、少なくと
も単語の表記を有する単語辞書の探索に基づいて、未知
語を検出し、所定の文書に含まれる文字列と前記未知語
の相関値を計算し、前記計算された相関値の高い文字列
を前記未知語に対する訂正語候補として選択する。With the above structure, an unknown word is detected from a predetermined sentence based on a search of a word dictionary having at least word notation, and a correlation value between the character string included in the predetermined document and the unknown word is calculated. Then, the character string having a high calculated correlation value is selected as a correction word candidate for the unknown word.

【０００７】また、別の発明は、未知語検出手段が、所
定の文から、少なくとも単語の表記を有する単語辞書の
探索に基づいて未知語を検出し、選択手段が、所定の文
書に含まれる文字列と前記未知語の相関値を計算し、選
択手段が、前記計算された相関値の高い文字列を前記未
知語に対する訂正語候補として選択する。In another invention, the unknown word detecting means detects an unknown word from a predetermined sentence based on a search of a word dictionary having at least word notation, and the selecting means is included in the predetermined document. The correlation value between the character string and the unknown word is calculated, and the selecting means selects a character string having a high calculated correlation value as a candidate for a correction word for the unknown word.

【０００８】[0008]

【実施例】はじめに、本発明に係る一実施例の文書処理
装置の構成のポイントの一つを要約した後に、その詳細
な説明に入ることにする。本実施例の文書処理装置は、
複数の文章からなるテキストを保持する文書ファイル
と、そのテキストをチェックした結果で発生した未知語
部分について、文書ファイルの中から未知語部分の文字
列と類似した箇所を収集する類似箇所収集部と、類似箇
所収集部によって収集された情報を格納する類似箇所収
集結果保持部と、類似箇所収集結果保持部に保持された
情報に基づいて、文書チェック結果保持部に保持された
未知語部分に対する訂正候補語句を推定する訂正語句候
補推定部とを備え、未知語部分に対する訂正候補語句を
提示することにより、ユーザ側の文章の校正の負担が減
り、また、効率良く編集処理行なうことができる効果を
有する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First, after summarizing one of the points of the structure of a document processing apparatus according to an embodiment of the present invention, a detailed description thereof will be given. The document processing apparatus of this embodiment is
A document file that holds text consisting of multiple sentences, and a similar part collection unit that collects, for the unknown word part generated as a result of checking the text, parts similar to the character string of the unknown word part from the document file. , The similar part collection result holding unit that stores the information collected by the similar part collection unit, and the correction of the unknown word part held in the document check result holding unit based on the information held in the similar part collection result holding unit By providing a correction word candidate estimation unit that estimates a candidate word and presenting the correction word candidate for an unknown word portion, the burden of proofreading of the sentence on the user side is reduced, and the effect that efficient editing processing can be performed can be achieved. Have.

【０００９】以下、図面を参照して本発明に係る実施例
の文書処理方法とその装置を詳細に説明する。図１は、
本発明の一実施例に係る文書処理装置での処理の構成の
概略を示すブロック図である。同図において、１は文章
を保持する文章保持部であり、２は文章保持部１に保持
されている文章を解析し、単語辞書に無い語や解析不能
な箇所などの未知語部分を手がかりに、文章中の誤りの
可能性がある箇所を判断し検出する文章チェック処理部
である。また、３は文章チェック処理部２で用いる単語
辞書で、４は文章チェック処理部２で用いる文法規則で
ある。A document processing method and apparatus according to an embodiment of the present invention will be described below in detail with reference to the drawings. FIG.
It is a block diagram which shows the outline of a structure of the process in the document processing apparatus which concerns on one Example of this invention. In the figure, 1 is a sentence holding unit that holds a sentence, 2 is a sentence that is held in the sentence holding unit 1 is analyzed, and unknown words such as words that are not in the word dictionary or parts that cannot be analyzed are used as clues. A sentence check processing unit that determines and detects a portion of a sentence that has a possibility of an error. Further, 3 is a word dictionary used in the sentence check processing unit 2, and 4 is a grammar rule used in the sentence check processing unit 2.

【００１０】５は、文章チェック処理部２の結果を保持
する文章チェック結果保持部である。６は複数の文章か
らなるテキストを保持する文書ファイルである。また、
７は、文章チェック結果保持部５に保持された未知語部
分について、文書ファイル６の中から未知語部分の文字
列と類似した文字列の箇所を収集する類似箇所収集処理
部である。８は、類似箇所収集処理部７によって収集さ
れた情報を格納する類似箇所収集結果保持部である。さ
らに、９は、類似箇所収集結果保持部７に保持された情
報に基づいて、文章チェック結果保持部５に保持された
未知語部分に対する訂正候補語句を推定する訂正候補語
句推定処理部である。Reference numeral 5 is a text check result holding unit for holding the result of the text check processing unit 2. Reference numeral 6 is a document file that holds a text composed of a plurality of sentences. Also,
Reference numeral 7 denotes a similar portion collection processing unit that collects, for the unknown word portion held in the sentence check result holding unit 5, a portion of a character string similar to the character string of the unknown word portion in the document file 6. Reference numeral 8 denotes a similar portion collection result holding unit that stores information collected by the similar portion collection processing unit 7. Further, 9 is a correction candidate word estimation processing unit that estimates a correction candidate word for the unknown word portion held in the sentence check result holding unit 5 based on the information held in the similar portion collection result holding unit 7.

【００１１】１０は、文書チェック結果保持部５に保持
されている文書チェック結果の内容に基づいて、誤りの
可能性がある箇所を指定し、更にその訂正候補語句を提
示する文章チェック結果表示部である。次に、図２は、
図１に示した処理構成における処理手順を示すフローチ
ャートである。本図を参照しながら、図５の５ａに示す
入力文章：「私はその方法によて問題を解決できる。」を一例として、本発明の一実施例である文書処理方法の
処理手順を示す。Reference numeral 10 denotes a sentence check result display unit for designating a position where there is a possibility of an error based on the content of the document check result held in the document check result holding unit 5 and further presenting the correction candidate word / phrase. Is. Next, in FIG.
3 is a flowchart showing a processing procedure in the processing configuration shown in FIG. 1. Referring to this figure, the processing procedure of the document processing method, which is an embodiment of the present invention, will be described by taking as an example the input sentence shown in 5a of FIG. 5: "I can solve the problem by the method." .

【００１２】図２において、まず、ステップＳ１で、文
章保持部１に文章が保持されているかを調べる。もし、
文章保持部１が空であれば、文章保持部１に文章が保持
されるまで待つ。文章保持部１に文章が保持されていれ
ば、ステップＳ２に移る。ステップＳ２では、文章保持
部１に保持された文章の文章チェック処理を行なう。こ
れは、文章チェック処理部２において、単語辞書３と文
法規則４を用いて行い、結果として、未知語部分（の文
字列）、検出位置、長さ（文字数）、訂正候補語句（こ
の時点では、情報は何も設定されていない）などの未知
語部分に関する様々な情報を文章チェック結果保持部５
に保持する。In FIG. 2, first, in step S1, it is checked whether a text is held in the text holding unit 1. if,
If the sentence holding unit 1 is empty, it waits until the sentence holding unit 1 holds the sentence. If the text is held in the text holding unit 1, the process proceeds to step S2. In step S2, sentence check processing of the sentence held in the sentence holding unit 1 is performed. This is performed in the sentence check processing unit 2 by using the word dictionary 3 and the grammar rule 4, and as a result, the unknown word portion (character string), the detection position, the length (the number of characters), the correction candidate phrase (at this time point) , No information is set), and various information about unknown word parts such as the sentence check result holding unit 5
To hold.

【００１３】図５の５ｂに、そのチェック結果を文章チ
ェック結果保持部５に保持した例を示す。この例では、
「によて」が文法規則４と単語辞書３に該当するものが
なく、結局、未知語部分として判定され、その検出位置
は、文章の先頭から「７」文字目であり、また、その長
さ（文字数）は、「３」であるという情報が格納されて
いることを示す。FIG. 5b shows an example in which the check result is held in the text check result holding unit 5. In this example,
There is nothing corresponding to the grammatical rule 4 and the word dictionary 3 in the "by", and it is eventually determined as an unknown word portion, and its detection position is the "7" th character from the beginning of the sentence, and its length. Sa (the number of characters) indicates that the information that the value is "3" is stored.

【００１４】ステップＳ３では、文章チェック結果保持
部５の中に未知語部分が検出されているかどうかを調
べ、検出されていれば、ステップＳ４の類似箇所収集処
理に移る。検出されていなければ、ステップＳ６に移
る。ステップＳ４では、類似箇所収集処理部７におい
て、文章チェック結果保持部５に保持されている未知語
部分について、文書ファイル６の中から類似文字列を収
集し、その情報（類似文字列とその出現個数）を類似箇
所収集結果保持部８に記憶する。In step S3, it is checked whether or not an unknown word portion is detected in the sentence check result holding unit 5, and if it is detected, the process proceeds to the similar portion collecting process in step S4. If not detected, the process proceeds to step S6. In step S4, the similar portion collection processing unit 7 collects similar character strings from the document file 6 for the unknown word portion held in the sentence check result holding unit 5, and outputs the information (similar character string and its appearance). The number) is stored in the similar portion collection result holding unit 8.

【００１５】図５の５ｃに、得られた類似文字列に関す
る情報の一例を示す。「Ｎｏ．」の欄には、各類似文字
列を識別する連続番号が格納される。例えば、１５番
は、類似文字列が「にしては」であり、図５の５ｂの
「によて」との類似度は「０.５８」であり、また、文
書ファイル６内での「にしては」の出現個数は「１」で
あるという情報が格納される。訂正候補優先度の欄は、
ステップ５の処理で処理される。FIG. 5c shows an example of information about the obtained similar character string. The "No." column stores a serial number for identifying each similar character string. For example, No. 15 has a similar character string of "ni" and has a similarity of "0.58" with "done" in 5b of FIG. The information that the number of appearances of "is" is "1" is stored. The correction candidate priority column is
It is processed by the process of step 5.

【００１６】ステップＳ５では、訂正候補語句推定処理
部９において、類似箇所収集結果保持部８を参照し、文
章チェック結果保持部５に保持されている文章チェック
結果の中の未知語部分について、最も類似しているもの
を推定し、文章チェック結果保持部５の該当する未知語
部分の訂正候補語句を記憶する。ここで、類似度Ｌの計
算は、次のように行なう。Ｌ＝（ｍ／Ｌａ＋ｍ／Ｌｂ）／２ここで、ｍ：一致文字数Ｌａ：未知語部分の文字列の長さ（文字数）Ｌｂ：類似箇所の文字列の長さ（文字数）この際、類似度が同じであれば、出現個数の多いものか
ら順に優先するようにする。In step S5, the correction candidate word estimation processing unit 9 refers to the similar portion collection result holding unit 8 to find out the most unknown word portion in the sentence check result held in the sentence check result holding unit 5. The similar candidate is estimated and the correction candidate word / phrase of the corresponding unknown word portion of the sentence check result holding unit 5 is stored. Here, the similarity L is calculated as follows. L = (m / La + m / Lb) / 2 where m: number of matching characters La: length of character string of unknown word portion (character number) Lb: length of character string of similar portion (character number) At this time, similarity If they are the same, priority is given to the one having the largest number of appearances.

【００１７】図５の５ｃの訂正候補優先度の欄にこの計
算された優先度を格納する。次に、図５の５ｄに示すよ
うに、図５の５ｃに示した訂正候補優先度の高いものを
選択して、それを、文章チェック結果保持部５の訂正候
補語句の欄に、未知語部分に対応する訂正候補語句とし
て格納する。ステップＳ６では、文章チェック結果保持
部５に保持されている文章チェック結果を基に、文章チ
ェック結果表示部１０に表示させて、処理を終了する。The calculated priority is stored in the correction candidate priority column of 5c in FIG. Next, as shown in 5d of FIG. 5, the one having a high correction candidate priority shown in 5c of FIG. 5 is selected, and it is displayed in the correction candidate word column of the text check result holding unit 5 in an unknown word. It is stored as a correction candidate word corresponding to the part. In step S6, the text check result display unit 10 displays the text check result stored in the text check result storage unit 5, and the process ends.

【００１８】図５の５ｅでの例では、元の入力文と、そ
の中の未知語として検出した検出語句「によて」と、そ
れに対応する選択された訂正候補語「によって」を同時
に表示している。また、元の入力文中で、未知語の部分
にアンダーラインを引いて、その位置を示している。次
に、図３は、上述の処理で使用した一例である文書ファ
イル６の内容を示したものである。In the example of 5e in FIG. 5, the original input sentence, the detected phrase "by" detected as an unknown word therein, and the corresponding correction candidate word "by" are displayed at the same time. are doing. Also, in the original input sentence, an unknown word is underlined to indicate its position. Next, FIG. 3 shows the contents of the document file 6 which is an example used in the above processing.

【００１９】また、図４は、類似箇所収集結果保持部８
の内容の一例を示したものであり、類似語句と出現個数
の情報が格納されている。る。図６は、上述のフローチ
ャートで示した文書処理を実行するハードウエア構成の
一例を示したものである。Further, FIG. 4 shows a similar portion collection result holding unit 8
It shows an example of the contents of the above, and stores information on similar words and the number of appearances. You. FIG. 6 shows an example of a hardware configuration for executing the document processing shown in the above-mentioned flowchart.

【００２０】図６を参照して、ＣＰＵ２００は、文書処
理装置全体を制御するもので、メモリ２０２に予め格納
された上述のフローチャートに対応するプログラムを順
に読み出して実行する。また、メモリ２０２には、上述
の単語辞書３、文法規則４、文書ファイル６、類似箇所
収集結果保持部８、文章チェック結果保持部５の内容等
が格納される。キーボード２０３やポインテイングデバ
イス２０４は、コマンドやテキストデータなどを入力す
る。デイスプレイモニタ２０１は、上述の文章チェック
結果表示部の一例であり、上述の処理結果などを表示す
る。（他の実施例） (01)上記実施例では、類似箇所収集処理部は一時的なも
のと捉えて、入力文章内の未知語部分毎に類似箇所収集
処理を行い、訂正候補語句推定処理を行なうような例に
ついて説明したが、類似箇所収集処理部を恒常的なも
の、すなわち、あらかじめ、文書ファイルから類似語句
の情報を収集しておき、訂正候補語句推定処理を行なう
形式にしてもよい。Referring to FIG. 6, CPU 200 controls the entire document processing apparatus, and sequentially reads and executes programs corresponding to the above-mentioned flowcharts stored in memory 202 in advance. Further, the memory 202 stores the contents of the word dictionary 3, the grammar rule 4, the document file 6, the similar portion collection result holding unit 8, the sentence check result holding unit 5, and the like described above. The keyboard 203 and pointing device 204 input commands, text data, and the like. The display monitor 201 is an example of the above-mentioned sentence check result display unit, and displays the above-mentioned processing result and the like. (Other Embodiments) (01) In the above embodiment, the similar portion collection processing unit considers it as a temporary one, performs similar portion collection processing for each unknown word portion in the input sentence, and performs correction candidate word estimation processing. Although an example has been described, the similar part collection processing unit may be of a constant type, that is, the similar candidate information may be collected in advance from the document file and the correction candidate word estimation process may be performed.

【００２１】(02)上記実施例では、未知語部分として、
連続したひらがな文字列から構成されるものに注目し、
かつ、類似箇所収集処理及び訂正候補語句推定処理にお
いては、文書ファイルの中のテキストから、連続したひ
らがな文字列に注目して、類似語句の収集及び訂正候補
語句の推定を行なう方法について説明したが、これに限
定されるわけではなく、未知語部分として、任意の文字
列に注目し、かつ、類似箇所収集処理及び訂正候補語句
推定処理においても、その未知語部分の文字列の長さに
対して許容範囲を設定し、その許容範囲の中で類似箇所
を収集し、訂正候補語句の推定を行なうような形式にし
てもよい。(02) In the above embodiment, as the unknown word part,
Pay attention to what is composed of consecutive hiragana character strings,
In addition, in the similar portion collection processing and the correction candidate word estimation processing, the method of collecting similar words and estimating the correction candidate words from the text in the document file has been explained, paying attention to the continuous hiragana character strings. , But not limited to this, pay attention to an arbitrary character string as the unknown word portion, and in the similar portion collection processing and the correction candidate word estimation processing, the length of the character string of the unknown word portion Alternatively, the allowable range may be set, and similar portions may be collected within the allowable range to estimate the correction candidate word / phrase.

【００２２】(03)上記実施例では、訂正候補語句推定処
理の際に、文章チェック結果保持手段に保持されている
文章チェック結果の中の未知語部分の文字列と、類似箇
所収集結果保持部に登録されている類似語句の文字列と
の文字の一致している個数、および出現個数に基づいて
類似度を計算し、その該当する未知語部分に対する訂正
候補語句を推定する方法について説明したが、この類似
度の計算方法については、類似性を算出でき、かつ、優
先付けを行なうことができるものであれば、どのような
計算方法を用いる形式にしてよい。(03) In the above embodiment, the character string of the unknown word portion in the sentence check result held in the sentence check result holding means and the similar portion collection result holding unit at the time of the correction candidate word estimation process The method of estimating the correction candidate phrase for the corresponding unknown word part by calculating the similarity based on the number of matching characters and the number of appearances of the character string of the similar phrase registered in. As for the method of calculating the degree of similarity, any method can be used as long as the degree of similarity can be calculated and prioritization can be performed.

【００２３】(04)上記実施例では、訂正候補語句推定処
理を行い、最も類似した訂正候補語句を１つ提示するよ
うな訂正候補語句推定処理について説明したが、これに
限定されるわけではなく、訂正候補語句は、その類似度
の高い順に、複数個提示するような訂正候補語句推定処
理を行なう形式にしてもよい。 (05)上記実施例では、類似箇所収集処理および訂正候補
語句推定処理を自動的に行なうことについて説明した
が、類似箇所収集処理または訂正候補語句推定処理の各
実行をユーザーが対話的に選択でき、それぞれの処理を
スキップできるようにする形式にしてもよい。(04) In the above-described embodiment, the correction candidate word estimation process is described in which the correction candidate word estimation process is performed and one most similar correction candidate phrase is presented, but the present invention is not limited to this. The correction candidate word / phrase may be in a format in which a plurality of correction candidate word / phrase estimation processes are presented in order of increasing similarity. (05) In the above embodiment, the similar portion collection processing and the correction candidate word estimation processing are automatically performed, but the user can interactively select each of the similar portion collection processing and the correction candidate word estimation processing. The format may be such that each process can be skipped.

【００２４】(06)上記実施例では、文章チェック結果表
示において、文章チェック結果の箇所をアンダーライン
を用いて表示するだけの形式について説明したが、これ
に限定されるものではなく、例えば、色表示したり、文
字のフォントを変えたり、音声を用いたりするなど、ユ
ーザーが認知できる形式であればどのような形式であっ
てもよい。(06) In the above-mentioned embodiment, in the text check result display, the format of only displaying the part of the text check result by using the underline has been described, but the present invention is not limited to this and, for example, a color Any format that can be recognized by the user, such as displaying, changing the font of characters, or using voice, may be used.

【００２５】(07)上記実施例では、文書チェック結果表
示において、文章チェック結果の箇所をアンダーライン
を用いて表示し、別の表示エリアに検出毎訂正候補語句
を提示する形式について説明したが、これに限定される
ものではなく、例えば、文章チェック結果の箇所をマウ
スなどでクリックすることにより検出語句に関する訂正
候補等の情報を提示する形式にしてもよい。(07) In the above-described embodiment, in the document check result display, the part of the text check result is displayed by using underlining, and the correction candidate word / phrase for each detection is presented in another display area. The present invention is not limited to this, and for example, a format in which information such as a correction candidate regarding a detected word is presented by clicking the location of the text check result with a mouse or the like may be used.

【００２６】(08)上記実施例では、日本語を例にとり説
明したが、これに限定されるものではなく、英語やドイ
ツ語等のどのような言語についても適用可能である。
尚、本発明は、複数の機器から構成されるシステムに適
用しても、１つの機器から成る装置に適用しても良い。
また、本発明はシステム或は装置にプログラムを供給す
ることによって達成される場合にも適用できることはい
うまでもない。(08) In the above embodiment, Japanese has been described as an example, but the present invention is not limited to this and can be applied to any language such as English or German.
The present invention may be applied to a system including a plurality of devices or an apparatus including a single device.
Needless to say, the present invention can be applied to a case where the present invention is achieved by supplying a program to a system or an apparatus.

【００２７】以上説明したように、本実施例によれば、
文章チェック処理によって得られた未知語部分に関し
て、文書チェックを用いて類似箇所を収集し、最も類似
したものをその訂正候補語句として推定することがで
き、その結果、誤りである可能性がある未知語部分の指
摘だけでなく、その未知語部分に関する訂正候補語句を
提示することができ、文章の校正処理が効率良く行なう
ことができるという効果が得られる。As described above, according to this embodiment,
Regarding the unknown word parts obtained by the sentence check process, similar parts can be collected using document check, and the most similar one can be estimated as the correction candidate word and, as a result, it is possible that the error is unknown. Not only the word portion is pointed out, but also correction candidate words and phrases relating to the unknown word portion can be presented, and the effect that the sentence proofreading process can be efficiently performed is obtained.

【００２８】[0028]

【発明の効果】以上説明したように本発明によれば、文
章の校正処理を効率良く行なうことができる。As described above, according to the present invention, the sentence proofreading process can be efficiently performed.

[Brief description of drawings]

【図１】本発明の実施例に係る機能ブロック図である。FIG. 1 is a functional block diagram according to an embodiment of the present invention.

【図２】本発明の実施例に係る処理手順を示すフローチ
ャートである。FIG. 2 is a flowchart showing a processing procedure according to the embodiment of the present invention.

【図３】本発明の実施例を説明するための文書ファイル
の内容の例である。FIG. 3 is an example of contents of a document file for explaining an embodiment of the present invention.

【図４】本発明の実施例を説明するための類似箇所収集
結果保持部の内容の例である。FIG. 4 is an example of contents of a similar portion collection result holding unit for explaining an embodiment of the present invention.

【図５】本発明の実施例を説明するための具体的処理の
過程を示した例である。FIG. 5 is an example showing a process of specific processing for explaining the embodiment of the present invention.

【図６】本発明の実施例の文書処理装置の構成を示した
図である。FIG. 6 is a diagram showing a configuration of a document processing apparatus according to an exemplary embodiment of the present invention.

[Explanation of symbols]

１文章保持部２文章チェック処理部３単語辞書４文法規則５文章チェック結果保持部６文書ファイル７類似箇所収集処理部８類似箇所収集結果保持部９訂正候補語句推定処理部１０文章チェック結果表示部 1 sentence holding unit 2 sentence check processing unit 3 word dictionary 4 grammar rule 5 sentence check result holding unit 6 document file 7 similar portion collection processing unit 8 similar portion collection result holding unit 9 correction candidate word estimation processing unit 10 sentence check result display unit

Claims

[Claims]

1. An unknown word detecting step of detecting an unknown word from a predetermined sentence based on a search of a word dictionary having at least word notations, and a correlation value between a character string included in a predetermined document and the unknown word. And a selecting step of selecting a character string having a high calculated correlation value as a correction word candidate for the unknown word.

2. The document processing method according to claim 1, wherein the correlation value is calculated based on the number of matching characters between a character string included in a predetermined document and the unknown word.

3. The correlation value is calculated based on the number of matching characters between a character string included in a predetermined document and the unknown word, and further, based on the number of characters of the unknown word and the number of characters of the character string. The document processing method according to claim 2.

4. The correlation value (L) is the number of matching characters (m) between a character string included in a predetermined document and the unknown word, and further, the number of characters of the unknown word (La) and the number of characters of the character string. The document processing method according to claim 3, wherein the document processing method is obtained by the following formula: L = (m / La + m / Lb) / 2 based on (Lb).

5. The selecting step further includes: when a plurality of character strings having a high calculated correlation value have the same correlation value, the one having a higher appearance frequency of the character string in the predetermined document is the unknown. 2. The document processing method according to claim 1, wherein the word is selected as a correction word candidate for the word.

6. The document processing method according to claim 1, further comprising a first display step of displaying the unknown word and a correction word candidate selected corresponding to the unknown word.

7. The document processing method according to claim 1, further comprising a second display step of displaying the predetermined sentence and a position of the unknown word of the predetermined sentence with a mark. .

8. An unknown word detection means for detecting an unknown word from a predetermined sentence based on a search of a word dictionary having at least word notation, and a correlation value between a character string included in a predetermined document and the unknown word. A document processing apparatus, comprising: a correlation calculation unit that calculates a correlation value; and a selection unit that selects a character string having a high correlation value calculated as a correction word candidate for the unknown word.

9. The document processing apparatus according to claim 8, wherein the correlation value is calculated based on the number of matching characters between a character string included in a predetermined document and the unknown word.

10. The correlation value is calculated based on the number of matching characters between a character string included in a predetermined document and the unknown word, and further, based on the number of characters of the unknown word and the number of characters of the character string. The document processing device according to claim 9.

11. The correlation value (L) is the number of matching characters (m) between a character string included in a predetermined document and the unknown word, and further, the number of characters of the unknown word (La) and the number of characters of the character string. The document processing apparatus according to claim 10, wherein the document processing apparatus is obtained by the following formula: L = (m / La + m / Lb) / 2 based on (Lb).

12. The selecting unit further determines, when a plurality of character strings having a high calculated correlation value have the same correlation value, the unknown one having a higher appearance frequency of the character string in the predetermined document. 9. The document processing device according to claim 8, wherein the word processing device is selected as a correction word candidate for a word.

13. The document processing apparatus according to claim 8, further comprising a first display unit that displays the unknown word and a correction word candidate selected corresponding to the unknown word.

14. The document processing apparatus according to claim 8, further comprising second display means for displaying the predetermined sentence and a position of the unknown word of the predetermined sentence with a mark. .