JPH04302070A

JPH04302070A - Character recognizing device

Info

Publication number: JPH04302070A
Application number: JP3065786A
Authority: JP
Inventors: Etsuo Ito; 悦雄伊藤; Kimito Takeda; 武田　公人; Koichi Hasebe; 浩一長谷部; Masaie Amano; 天野　真家
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-03-29
Filing date: 1991-03-29
Publication date: 1992-10-26
Anticipated expiration: 2015-09-04
Also published as: JP3083171B2

Abstract

PURPOSE:To enhance the efficiency of correcting work by displaying characters selected from plural character patterns having degrees of resemblance higher than a prescribed value and characters, which are judged to be absent in a vocaburary dictionary, in distinction from each other. CONSTITUTION:When input characters are converted from picture data to character data, words which may be erroneous are displayed as candidate words requiring correction. In this case, characters selected from plural patterns resembling them at the time of using a recognition dictionary 5 to perform pattern collation and characters whose collation results do not exist in a vocaburary dictionary 7, that is, which are judged to be unknown words are displayed on a display part 2 in distinction from each other. Thus, an operator refers to this displayed state to efficiently perform post-editing like correction.

Description

[Detailed description of the invention]

［発明の目的］ [Purpose of the invention]

【０００１】0001

【産業上の利用分野】本発明は、文字認識装置に関わる
。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device.

【０００２】0002

【従来の技術】近年の文書処理の電子化に伴い、紙に印
刷された文書を読み取り、電子メディア化する文字読み
取り装置が開発されている。従来の文字認識装置は、各
種文書が記載された原稿を読み取り部により読み取る。この時、読み取り部は原稿を文字の集まりとして捕らえ
るのではなく、点の集まりからなる画像データとして読
み取る。そして、読み取った画像データから文字認識部
が任意の範囲を切り出し、認識用辞書に格納している標
準文字パターンとのパターン照合処理を行う。この結果
、文字認識部で照合ができれば、この文字は出力部へ出
力されると共に、記録部に記録される。これらの処理を
行うことにより装置内で原稿上の文字がコードに変換さ
れる。しかし、これらの処理では、数字の‘０’とアル
ファベット‘Ｏ’、小文字の‘１（エル）’、と大文字
の‘Ｉ（アイ）’、数字の‘１’など類似した文字の間
に曖昧性が残る場合があり、この部分をオペレータに教
示する必要があった。2. Description of the Related Art With recent advances in electronic document processing, character reading devices have been developed that read documents printed on paper and convert them into electronic media. A conventional character recognition device uses a reading unit to read a manuscript in which various documents are written. At this time, the reading section does not capture the document as a collection of characters, but reads it as image data consisting of a collection of dots. Then, the character recognition unit cuts out an arbitrary range from the read image data and performs pattern matching processing with standard character patterns stored in a recognition dictionary. As a result, if the character recognition unit can match the characters, this character is output to the output unit and recorded in the recording unit. By performing these processes, characters on the document are converted into codes within the apparatus. However, in these processes, ambiguity occurs between similar characters such as the number '0' and the alphabet 'O', the lowercase '1' and the uppercase 'I', and the number '1'. In some cases, there may be some residual error, and it was necessary to teach the operator about this part.

【０００３】また、文字認識部に曖昧性が無いと判断し
ても、原稿の文字掠れなどにより読み取り誤りがある場
合がある。このため、後処理として、読み取った文字を
単語毎にまとめ、その単語が語彙辞書に存在するか否か
を判断し（これを以降未知語判定と呼ぶ）、存在しない
場合には読み取り誤りである可能性があることをオペレ
ータに指示する必要があった。読み取り誤りの場合と曖
昧性が残る場合を合わせて認識誤りと呼ぶ。Furthermore, even if the character recognition unit determines that there is no ambiguity, there may be reading errors due to blurred characters in the document. Therefore, as a post-processing step, the read characters are grouped into words and it is determined whether the word exists in the vocabulary dictionary (hereinafter referred to as unknown word determination), and if it does not exist, it is a reading error. It was necessary to instruct the operator that this was a possibility. Both cases of reading errors and cases where ambiguity remains are called recognition errors.

【０００４】しかし、従来の文字認識装置では、認識誤
り箇所の候補をオペレータに指示しても、オペレータに
はそれが、文字認識により曖昧性があると判断されたも
のであるか、未知語判定によって未知語と判断されたも
のであるかの情報を得ることができず、認識結果の修正
作業にかかる負担が大きかった。また、語彙辞書が整備
されていないため正しく認識されている語句を指摘した
り、曖昧性があるため正しく認識されている語句を指摘
したりすることがあり、煩わしかった。However, in conventional character recognition devices, even if the operator is instructed to identify a candidate for a recognition error location, the operator must check whether the candidate location has been determined to be ambiguous by character recognition or whether it is an unknown word. It was not possible to obtain information as to whether the word was determined to be an unknown word or not, and the work of correcting the recognition results was burdensome. In addition, the system sometimes points out words that are correctly recognized because a vocabulary dictionary is not maintained, or points out words that are correctly recognized because they are ambiguous.

【０００５】[0005]

【発明が解決しようとする課題】このように従来の文字
認識装置では、認識誤り箇所の候補をオペレータに指示
しても、それが文字認識の曖昧性によるものか、未知語
であることによるものかがオペレータにはわからないた
め、以降の修正作業の負担が大きいという問題があった
。また、正しく認識されている語であるにも拘らず、語
彙辞書の未整備や曖昧性の判断不良のために認識誤りと
してしまうことを防ぐことができないという問題があっ
た。[Problems to be Solved by the Invention] In this way, in conventional character recognition devices, even if the operator is instructed on candidates for recognition error locations, it is difficult to determine whether the error is due to ambiguity in character recognition or because it is an unknown word. Since the operator does not know what is happening, there is a problem in that the subsequent correction work is a heavy burden. Furthermore, there is a problem in that even though the word is correctly recognized, it is not possible to prevent the word from being recognized incorrectly due to an underdeveloped vocabulary dictionary or poor judgment of ambiguity.

【０００６】本発明はかかる事情を考慮してなされたも
ので、その目的とするところは、認識誤り箇所の候補の
認識誤りの原因をオペレータに示し、修正作業の効率化
を図ることのできる文字認識装置を提供することにある
。［発明の構成］The present invention has been made in consideration of such circumstances, and its purpose is to provide a character system that can indicate to an operator the cause of a recognition error in a candidate for a recognition error location and improve the efficiency of correction work. The purpose of this invention is to provide a recognition device. [Structure of the invention]

【０００７】[0007]

【課題を解決するための手段】本発明にかかる文字認識
装置は、文字を画像パターンとして入力する入力部と、
入力した画像パターンと予め記憶された認識用辞書の文
字パターンとを照合して、類似度が所定値以上の文字パ
ターンの文字を選択する文字認識部と、得られた文字か
ら成る文字列が予め記憶された語彙辞書の文字列の中に
存在するか否かを判定する未知語判定部と、得られた文
字を表示する表示部とを備え、文字認識部において類似
度が所定値以上の文字パターンが複数ある中から選択さ
れた文字を含む文字列の場合識別子Ａを、未知語判定部
において語彙辞書中に存在しない未知語であると判定さ
れた文字列の場合識別子Ｂを付与して記憶し、所定の条
件を満たすときに、文字列とこれに対応する識別子とを
表示することを特徴とするものである。[Means for Solving the Problems] A character recognition device according to the present invention includes an input section for inputting characters as an image pattern;
A character recognition unit that compares the input image pattern with character patterns in a pre-stored recognition dictionary and selects characters whose similarity is greater than or equal to a predetermined value; It is equipped with an unknown word determination unit that determines whether or not the unknown word exists in a character string of a stored vocabulary dictionary, and a display unit that displays the obtained character, and the character recognition unit detects characters whose similarity is equal to or higher than a predetermined value. In the case of a character string containing a character selected from among multiple patterns, an identifier A is assigned and stored, and in the case of a character string determined by the unknown word determination unit to be an unknown word that does not exist in the vocabulary dictionary, an identifier B is assigned and stored. However, when a predetermined condition is met, a character string and an identifier corresponding to the character string are displayed.

【０００８】[0008]

【作用】本発明によれば、入力した文字を画像データか
ら文字データに変換する際に誤りがあった可能性のある
語句を修正必要候補語句として表示するのであるが、こ
の場合、認識用辞書を用いてパターン照合する際に近い
パターンが複数あったのか（Ａ）、照合された結果が語
彙辞書中に存在しないと判定されたのか（Ｂ）を区別し
て表示あるいは記憶する。これによりオペレータは、表
示された状態を参照しながら修正等の後編集を行うこと
ができる。[Operation] According to the present invention, words that may have made an error when converting input characters from image data to character data are displayed as candidate words that require correction. When performing pattern matching using the lexicon, whether there were multiple similar patterns (A) or whether it was determined that the matched result does not exist in the vocabulary dictionary (B) is displayed or stored. This allows the operator to perform post-editing such as correction while referring to the displayed state.

【０００９】[0009]

【実施例】以下、図面を参照しながら本発明の一実施例
について説明を行う。図１に、本発明の一実施例に係る
文字認識装置の構成を表すブロック図を示す。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 shows a block diagram showing the configuration of a character recognition device according to an embodiment of the present invention.

【００１０】図１において入力部１は本発明における文
字認識装置の動作に必要なコマンドを入力したり、認識
結果に対する修正文字を入力したりできるもので通常、
キーボードやマウス、タッチパネルなどが用いられる。表示部２は、入力部１より入力されたコマンドや文字を
表示したり、後述する読み取り部３からの読み取り結果
や文字認識部４による文字認識の途中状態や認識結果、
修正必要候補語句指示部８による指示内容を表示したり
できるもので、通常ビットマップディスプレイなどが使
用される。なお、画面を２分割して、片方を読み取り部
が読み取った画像イメージを表示させ、他方に本装置が
認識した文字を対応させて表示することもできる。読み
取り部３は原稿を読み取るためのもので、例えばライン
イメージセンサを備えたスキャナが用いられる。なお、
このスキャナの解像度により読み取れる文字の種類が決
まる。文字認識部４は読み取り部３が原稿を読み取った
結果得られた画像データを文字として認識するものであ
る。読み取り部３が原稿を読み取るときは、文字単位で
読み取っているのではなく、原稿の一端から順次スキャ
ンしていき、各点に情報が有るか無いかの２値で表して
画像データを得る。このため解像度を上げるほど、より
細かく原稿を読み取ることになり、より複雑な文字も読
み取ることができる。文字認識部４に供される画像デー
タは、読み取り部３で読み取る以外にも、イメージファ
イル１２を入力する等の方法で得ることができる。そし
て、文字認識部４は画像データの任意の範囲を対象にし
て認識用辞書５に予め記録している標準文字パターンと
のパターン照合処理を行う。このパターン照合処理を画
像データの全てに対して行い、類似度が所定値以上とな
った標準文字パターンがマッチしたとして、画像データ
に対応する文字であると認識する。未知語判定部６は、
前述の文字認識部４により認識された結果を単語単位で
語彙辞書７に記録されているか否かを判定するものであ
る。修正必要候補語句指示部８は、文字認識部４による
パターン照合により複数のパターンと読み込みパターン
がマッチした場合あるいは読み込みパターンがどのパタ
ーンともマッチしない読み取りエラーの場合などの曖昧
な文字や、未知語判定部６により未知語であると判定さ
れた語は、文字認識において誤って認識されている可能
性があるとの判断により、その文字や語をオペレータに
指示するためのものである。識別子表示部９は、修正必
要候補語句指示部８により指摘された語句が文字認識部
と未知語判定部のどちらの判断により指示されているか
を区別して表示するためのものである。記録部１０は、
読み取り部３で読み取った画像データや、文字認識部４
による文字認識の途中結果や最終結果などを記録するも
のである。制御部１１は前記各部の制御やデータの仲介
を行うものである。In FIG. 1, an input section 1 is capable of inputting commands necessary for the operation of the character recognition device of the present invention and inputting correction characters for the recognition results.
A keyboard, mouse, touch panel, etc. are used. The display unit 2 displays commands and characters input from the input unit 1, and displays the reading results from the reading unit 3, the intermediate status and recognition results of character recognition by the character recognition unit 4, which will be described later.
It is capable of displaying the contents of the instruction by the correction-required candidate word/phrase instruction unit 8, and a bitmap display or the like is usually used. It is also possible to divide the screen into two, displaying the image read by the reading section on one side, and displaying the corresponding characters recognized by the device on the other side. The reading unit 3 is for reading a document, and uses, for example, a scanner equipped with a line image sensor. In addition,
The resolution of this scanner determines the types of characters that can be read. The character recognition unit 4 recognizes the image data obtained as a result of the reading unit 3 reading the original as characters. When the reading section 3 reads a document, it does not read each character, but sequentially scans the document starting from one end, and obtains image data by representing each point in binary form, indicating whether or not there is information. For this reason, the higher the resolution, the finer the document will be read, and the more complex characters can be read. The image data provided to the character recognition section 4 can be obtained not only by being read by the reading section 3 but also by inputting an image file 12 or the like. Then, the character recognition unit 4 performs a pattern matching process on an arbitrary range of the image data with a standard character pattern recorded in advance in the recognition dictionary 5. This pattern matching process is performed on all of the image data, and if a standard character pattern with a degree of similarity equal to or higher than a predetermined value matches, it is recognized as a character corresponding to the image data. The unknown word determination unit 6
It is determined whether or not the results recognized by the character recognition unit 4 described above are recorded in the vocabulary dictionary 7 on a word-by-word basis. The candidate word/phrase instruction unit 8 that requires correction is used to identify ambiguous characters or unknown words, such as when the read pattern matches multiple patterns through pattern matching by the character recognition unit 4, or when there is a reading error in which the read pattern does not match any pattern. The word determined to be an unknown word by unit 6 is determined to be likely to have been erroneously recognized in character recognition, and is used to instruct the operator about the character or word. The identifier display section 9 is for distinguishing and displaying whether the word or phrase pointed out by the candidate word/phrase requiring correction section 8 has been determined by the character recognition section or the unknown word determination section. The recording unit 10 is
The image data read by the reading unit 3 and the character recognition unit 4
It records the intermediate and final results of character recognition. The control section 11 controls each section and mediates data.

【００１１】図２は、本発明における文字認識装置の全
体の処理の流れを示すフローチャートである。原稿を文
字データとして電子化するためには、先ず、原稿を読み
取り部３により読み取る（Ｓ２０１　）。次に、読み取
った結果に対して文字認識部４において文字認識を行う
（Ｓ２０２　）。このとき、認識結果に曖昧性が残る語
句が有る場合はそれらをＡとする。例えば曖昧性を示す
フラグＳＰ−Ｆを立ててその語句とともに記録部１０に
記憶する。つぎに、文字認識が終わったデータに対して未知語判定
部６において未知語判定を行う（Ｓ２０３　）。文字認
識の際に類似度が所定値以上となるパターンが複数出て
きたとき、最も類似度の高いものについてのみ未知語判
定するとしてもよいし、出てきたもの全てについて未知
語判定してもよい。このとき、未知語となった語句が有
る場合はそれ等をＢとする。例えば未知語であることを
示すフラグｕｎｄｅｆ−Ｆを立ててその語句とともに記
録部１０に記憶する。以上で認識が終わり修正が必要で
あるかもしれない箇所が判定できたため、オペレータに
よる確認・修正を行うためにこのデータを表示部２によ
り表示する（Ｓ２０４　）。そして、オペレータの指示
に従い、修正必要候補語句の指摘とその語句に対する識
別子の表示を行う（Ｓ２０５　）。このとき、最も類似
度の高い第一候補とその識別子を表示するが、第二候補
以下がある場合には次候補キー等によりこれらも識別子
とともに表示するようにしてもよい。オペレータはこの
指示・表示を参照しながら入力部１より文書データの修
正を行い（Ｓ２０６　）、最終的な結果を得る。なお、
文字認識の際に類似度が所定値以上となるパターンが全
くなかったときには、未知語判定及び文字データの表示
はできないため、曖昧性の識別子のみ表示してもよいし
、別に「読み取りエラー」と表示してもよい。FIG. 2 is a flowchart showing the overall processing flow of the character recognition device according to the present invention. In order to digitize a manuscript as character data, first, the manuscript is read by the reading section 3 (S201). Next, the character recognition unit 4 performs character recognition on the read result (S202). At this time, if there are words or phrases that remain ambiguous in the recognition results, they are designated as A. For example, a flag SP-F indicating ambiguity is set and stored in the recording unit 10 together with the phrase. Next, the unknown word determining unit 6 performs unknown word determination on the data for which character recognition has been completed (S203). When multiple patterns with similarity equal to or higher than a predetermined value appear during character recognition, it may be possible to determine unknown words only for the pattern with the highest degree of similarity, or it may be possible to determine unknown words for all patterns that appear. good. At this time, if there are words that are unknown words, they are designated as B. For example, a flag undef-F indicating that the word is an unknown word is set and stored in the recording unit 10 together with the word. Now that the recognition has been completed and the locations that may require correction have been determined, this data is displayed on the display unit 2 for confirmation and correction by the operator (S204). Then, in accordance with the operator's instructions, candidate words requiring correction are pointed out and an identifier for the word is displayed (S205). At this time, the first candidate with the highest degree of similarity and its identifier are displayed, but if there are second or subsequent candidates, these may also be displayed together with the identifier using the next candidate key or the like. The operator corrects the document data from the input unit 1 while referring to the instructions and display (S206), and obtains the final result. In addition,
If there is no pattern with similarity greater than a predetermined value during character recognition, unknown word determination and character data cannot be displayed, so only the ambiguous identifier may be displayed, or a "reading error" may be displayed separately. May be displayed.

【００１２】図２の処理をさらに詳細に説明すると、い
くつかのバリエーションがあり、その例を図３〜図５に
示す。図３は、修正必要候補語句の指摘（修正必要候補
語句指示部８による）とその語句に対する識別子の表示
（識別子表示部９による）を行うステップの動作の説明
の為のフローチャートである。修正必要候補語句の指摘
とその語句に対する識別子の表示を行う場合は、まず、
処理を行う文書の最初の語句を取出し（Ｓ３０１，Ｓ３
０２　）それをＤとする。Ｄが文字認識の結果曖昧性が
ある文字を含んでいるか否か（Ｄが記録部１０にある語
句群Ａの中に存在するか否か）を判断し（Ｓ３０３）、
含まれていればＤは修正必要候補語句であるため、表示
部２にその旨の指示を行う（Ｓ３０４　）。また、同時
にこの指示が文字認識の結果曖昧性がある文字を含んで
いる為であることを示す識別子を表示する（Ｓ３０５　
）。ステップ３０３　において、曖昧性のある文字が含
まれていないと判断された場合には、次にＤが未知語で
あるか否か（Ｄが記録部１０にある語句群Ｂの中に存在
するか否か）の判断を行う　　（Ｓ３０６　）。未知語
である場合には、Ｄは修正必要候補語句であるため、表
示部２にその旨の指示を行う（Ｓ３０７　）。また、同
時にこの指示がＤが未知語である為であることを示す識
別子を表示する（Ｓ３０８　）。オペレータは、Ｄが修
正必要候補語句であるとして識別子とともに表示部２に
表示されると、これを見て修正が必要ならば、適宜修正
を行う（図２Ｓ２０６　）。この様に、Ｄに対する修正
必要候補語句の指摘とその語句に対する識別子の表示を
行った後、オペレータに次の修正必要候補語句の指摘を
行うかの問い合わせを行い（Ｓ３０９　）、指摘を行う
場合には次の候補が発見されるかあるいはデータのおわ
りまでこれらの手順を繰り返す。図３の処理においては
、曖昧性のある語に対しては、未知語かどうかは問わず
、オペレータの確認・修正のみにより最終結果を得るよ
うになっている。To explain the process of FIG. 2 in more detail, there are several variations, examples of which are shown in FIGS. 3 to 5. FIG. 3 is a flowchart for explaining the operation of the step of pointing out a candidate word/phrase requiring correction (by the correcting candidate word/phrase instruction unit 8) and displaying an identifier for the word/phrase (by the identifier display unit 9). If you want to point out candidate words that need correction and display the identifier for those words, first,
Extracts the first word of the document to be processed (S301, S3
02) Let it be D. Determine whether or not D includes a character that is ambiguous as a result of character recognition (whether or not D exists in word group A in the recording unit 10) (S303);
If it is included, D is a candidate word requiring correction, and an instruction to that effect is given to the display unit 2 (S304). At the same time, an identifier indicating that this instruction contains ambiguous characters as a result of character recognition is displayed (S305
). If it is determined in step 303 that no ambiguous characters are included, the next step is to determine whether D is an unknown word (whether D exists in word group B in the recording unit 10). A determination is made as to whether or not (S306). If it is an unknown word, D is a candidate word requiring correction, and an instruction to that effect is given to the display unit 2 (S307). At the same time, an identifier indicating that this instruction is because D is an unknown word is displayed (S308). When D is displayed on the display unit 2 together with the identifier as a candidate word requiring correction, the operator looks at this and, if correction is necessary, makes the appropriate correction (S206 in FIG. 2). In this way, after pointing out the candidate word that needs to be corrected for D and displaying the identifier for that word, the operator is asked whether to point out the next candidate word that needs to be corrected (S309). repeats these steps until the next candidate is found or the data is exhausted. In the process shown in FIG. 3, for words with ambiguity, the final result is obtained only by confirmation and correction by the operator, regardless of whether they are unknown words or not.

【００１３】図４のフローチャートに示す処理は、曖昧
性のある語に対しても、未知語であるか否かの判断を行
う点が図３の処理とは異なる。図３と同様、処理を行う
文書の語句を順番に取り出してＤとする。Ｄが文字認識
の結果曖昧性がある（語句群Ａに含まれる）と判断され
れば（Ｓ４０３　ｙｅｓ）、Ｄを修正必要候補語句とし
て、曖昧性によることを示す識別子とともに表示する（
Ｓ４０４　，Ｓ４０５　）。こうして表示されたＤに対
してオペレータが確認・修正を行い（Ｓ４０６　）、こ
のＤが未知語であるか否かの判断（Ｓ４０７　）にかけ
る。Ｄが曖昧性がないと判断された場合（Ｓ４０３　ｎ
ｏ）も、未知語か否かの判断（Ｓ４０７　）を行い、未
知語（語句群Ｂに含まれる）と判断されれば（Ｓ４０８
　ｙｅｓ）、Ｄを修正必要候補語句として、未知語であ
ることによることを示す識別子とともに表示する（Ｓ４
０８　，Ｓ４０９　）。こうして表示されたＤに対して
オペレータが確認・修正を行う（Ｓ４１０　）。The process shown in the flowchart of FIG. 4 differs from the process of FIG. 3 in that it is determined whether or not an ambiguous word is an unknown word. Similar to FIG. 3, the words and phrases of the document to be processed are taken out in order and designated as D. If it is determined that D is ambiguous (included in word group A) as a result of character recognition (S403 yes), D is displayed as a candidate word requiring correction along with an identifier indicating that it is due to ambiguity (
S404, S405). The operator checks and corrects the D thus displayed (S406), and then judges whether or not this D is an unknown word (S407). If D is determined to be unambiguous (S403 n
o) is also judged as an unknown word (S407), and if it is judged as an unknown word (included in word group B) (S408).
yes), D is displayed as a candidate word requiring correction along with an identifier indicating that it is an unknown word (S4
08, S409). The operator checks and corrects D thus displayed (S410).

【００１４】図４の処理においては、曖昧性のために修
正必要候補語句とされたものが実際に認識誤りであった
場合には、オペレータが修正してから未知語かどうかの
検査をする。そこで、オペレータが修正した語に限って
は、Ｓ４０７　の時点で、修正結果Ｄ´を未知語判定部
６　にかけて、正しい語句Ｄ´が未知語であるかどうか
を改めて判定してもよい。こうすることにより、未知語
判定された語には曖昧性が含まれていないことになるの
で、純粋に未知語である語だけがＳ４０８　，Ｓ４０９
　にて表示される。よって、Ｓ４１０　の時点で、オペ
レータが、表示された未知語を語彙辞書７に登録する旨
の指示を出すようにすれば、語彙辞書の充実が図れ、以
降語彙辞書の不備のために正しい語が修正必要候補とさ
れることがなくなる。In the process shown in FIG. 4, if a candidate word that is determined to require correction due to ambiguity is actually a recognition error, the operator corrects it and then checks whether it is an unknown word. Therefore, for words that have been corrected by the operator, the correction result D' may be applied to the unknown word determination section 6 at step S407 to determine whether or not the correct word D' is an unknown word. By doing this, words that have been determined as unknown words do not contain ambiguity, so only words that are purely unknown words are processed in steps S408 and S409.
will be displayed at Therefore, if the operator issues an instruction to register the displayed unknown word in the vocabulary dictionary 7 at the time of S410, the vocabulary dictionary will be enriched, and from now on, if the correct word is not found due to a deficiency in the vocabulary dictionary, the operator can issue an instruction to register the displayed unknown word in the vocabulary dictionary 7. It will no longer be considered as a candidate for correction.

【００１５】図５のフローチャートに示す処理は、修正
必要候補語句である旨の指摘を行う条件を、オペレータ
が設定できるようにしたものである。図３と同様、処理
を行う文書の語句を順番に取り出してＤとする。Ｄが語
句群Ａ，Ｂの両方に含まれる（曖昧性があり未知語でも
ある）場合（Ｓ５０３　）、Ｄが語句群Ａに含まれＢに
は含まれない（曖昧性があっても未知語ではない）場合
（Ｓ５０４　）、Ｄが語句群Ｂに含まれＡには含まれな
い（曖昧性はないが未知語である）場合（Ｓ５０５　）
、Ｄが語句群ＡにもＢにも含まれない（曖昧性はなく未
知語でもない）場合（Ｓ５０６）に分けて処理を行う。Ｓ５０３は、認識誤りである可能性が高い。Ｓ５０４　
，Ｓ５０５　は、正しく認識しているにも拘らずそれぞ
れ、曖昧性の判断が間違ったこと、語彙辞書に十分な数
の語句が登録されていないことが原因で修正必要候補と
判断してしまう可能性のあるものなので、オペレータが
これらを修正必要候補として表示するか否かの設定をし
、表示する設定がなされているときのみ（Ｓ５１０　，
Ｓ５１２　）表示する（Ｓ５１１　，Ｓ５１３　）。こ
の表示をオペレータが見て、適宜修正を行い、（図２Ｓ
２０６　）、未知語を語彙辞書に登録する必要があれば
登録する。曖昧性があって未知語でもあるＳ５０３　の
場合には、これを修正必要候補語句として表示する際に
、その識別子を曖昧性として表示するか（Ｓ５０７　）
、未知語として表示するか（Ｓ５０８　）の優先順位を
オペレータが設定しておくことができる。上述したよう
な、修正必要候補語句として表示する論理条件の設定や
、識別子の優先順位の設定を、曖昧性のある語句の出現
割合によって変更するとしてもよい。The process shown in the flowchart of FIG. 5 allows the operator to set conditions for indicating that a word is a candidate word requiring correction. Similar to FIG. 3, the words and phrases of the document to be processed are taken out in order and designated as D. If D is included in both word groups A and B (it is ambiguous and is also an unknown word) (S503), D is included in word group A but not in word group B (even if there is ambiguity, it is an unknown word). ) (S504), D is included in word group B but not included in A (there is no ambiguity but it is an unknown word) (S505)
, D is not included in the word group A or B (there is no ambiguity and it is not an unknown word) (S506), and the processing is performed separately. S503 is highly likely to be a recognition error. S504
, S505 may be judged as candidates requiring correction due to incorrect ambiguity judgment or insufficient number of words registered in the vocabulary dictionary, even though they are correctly recognized. Therefore, the operator sets whether or not to display these as correction-required candidates, and only when the setting to display them is made (S510,
S512) Display (S511, S513). The operator looks at this display and makes appropriate corrections (Fig. 2S
206), register the unknown word in the vocabulary dictionary if necessary. In the case of S503, which is an ambiguous and unknown word, when displaying this as a candidate word that requires correction, whether to display its identifier as ambiguous (S507)
, the operator can set the priority order of whether to display the word as an unknown word (S508). As described above, the settings of the logical conditions to be displayed as candidate words requiring correction and the settings of the priority order of identifiers may be changed depending on the appearance rate of ambiguous words.

【００１６】以上のような処理を行う場合の表示部２の
様子の例を図７（ａ）に示す。認識結果のデータ中修正
必要候補語句が反転して表示され、この語句Ｄに対応す
る識別子が、例えば、曖昧性があるときには「曖昧性＋
」、ないときには「曖昧性−」（あるいは何も表示しな
い）、未知語であるときには「未知語＋」、そうでない
ときには「未知語−」（あるいは何も表示しない）とい
うように表示される。FIG. 7(a) shows an example of the state of the display section 2 when performing the above-described processing. Candidate words that require correction in the recognition result data are displayed in reverse video, and if the identifier corresponding to this word D is ambiguous, for example, "Ambiguity +
", if there is no word, "ambiguity -" (or nothing is displayed), if it is an unknown word, "unknown word +", otherwise "unknown word -" (or nothing is displayed), and so on.

【００１７】図５の処理においては曖昧性、未知語の両
方のフラグが同時に表示されるが、図３の処理において
は「曖昧性＋」と表示された語句に対しては未知語のフ
ラグは表示されない。また、図４の処理においては「曖
昧性＋」と表示された語句に対してユーザが確認・修正
を行った後に未知語のフラグが表示される。In the process of FIG. 5, both the ambiguity and unknown word flags are displayed at the same time, but in the process of FIG. Do not show. Further, in the process of FIG. 4, after the user confirms and corrects the word displayed as "ambiguity+", an unknown word flag is displayed.

【００１８】なお、上記の実施例では、まず、文字認識
の曖昧性の検査を行い、次に未知語の検査を行っている
が、この順序は逆でもよい。また、上記の実施例では、
予め文字認識と未知語判定を別々に前もって行っていた
が、これを同時に行ってもよいし、さらに結果の表示に
併せて行ってもよい。例えば、図４においてＳ４０３　
のところで取り出した語Ｄを文字認識し、その曖昧性を
判断し、曖昧性があれば表示して（Ｓ４０４　，Ｓ４０
５　）修正し（Ｓ４０６　）、その後Ｓ４０７　でこの
Ｄが語彙辞書にない未知語であるかを調べ、未知語であ
れば表示・修正を行い、この一連の処理の後次の語をＤ
として取り出すという具合に、様々に変形できる。In the above embodiment, the ambiguity of character recognition is first checked, and then unknown words are checked, but this order may be reversed. In addition, in the above example,
Although character recognition and unknown word determination were performed separately in advance, they may be performed simultaneously or may be performed in conjunction with displaying the results. For example, in FIG. 4, S403
Character recognition is performed on the word D extracted at step , its ambiguity is determined, and if there is ambiguity, it is displayed (S404, S40
5) correction (S406), and then in S407 it is checked whether this D is an unknown word that is not in the vocabulary dictionary, and if it is an unknown word, it is displayed and corrected, and after this series of processing, the next word is changed to D.
It can be transformed in various ways, such as by taking it out as

【００１９】ここで、上述した文字認識装置を翻訳装置
に用いた場合の構成例を図６に示す。読み取り部６０３
　では、第１言語（例えば英語）で書かれた文書をイメ
ージ情報として読み取り、文字認識部６０４　では、こ
のイメージ情報のパターンを認識用辞書６０５　の標準
文字パターンと照合して、文字コードとして認識する。未知語判定部６０６　では、この文字コードを並べて作
られる語句が語彙辞書６０７　中に存在するかどうかで
未知語判定を行う。認識され未知語判定されたデータは
翻訳部６１２　に送られ、翻訳用辞書６１３　を用いて
第２言語（例えば日本語）の文章に変換される。語彙辞
書６０７　は、翻訳用辞書６１３　にある第１言語のデ
ータを圧縮して高速に検索できるように作成したもので
ある。よって、未知語を補充する際には、語彙辞書のみ
でなく翻訳用辞書にも未知語及びこれに対応する訳語を
登録する。FIG. 6 shows an example of a configuration in which the above-described character recognition device is used as a translation device. Reading section 603
Then, a document written in the first language (for example, English) is read as image information, and the character recognition unit 604 compares the pattern of this image information with the standard character pattern of the recognition dictionary 605 to recognize it as a character code. . The unknown word determination unit 606 performs unknown word determination based on whether a phrase formed by arranging these character codes exists in the vocabulary dictionary 607 . The data that has been recognized and determined to be an unknown word is sent to a translation unit 612 and converted into a second language (for example, Japanese) text using a translation dictionary 613 . The vocabulary dictionary 607 is created by compressing the first language data in the translation dictionary 613 so that it can be searched quickly. Therefore, when adding unknown words, the unknown words and their corresponding translations are registered not only in the vocabulary dictionary but also in the translation dictionary.

【００２０】記憶部６１０　は、読み取り部６０３　の
イメージ情報、文字認識部６０４　の認識結果や途中経
過、未知語判定部６０６　の判定結果、翻訳部６１２　
の翻訳結果等を記憶する。修正必要候補語句指示部６０
８　は、入力部（例えばキーボード）６０１　からの指
示に基づき、記憶部６１０　に記憶されたデータをもと
に、上述の文字認識装置で説明したように修正必要候補
語句の指摘を、表示部６０２　と識別子表示部６０９　
に対して行う。入力部６０１　は、オペレータが、指摘
された修正必要候補語句の修正や、その他各部への指示
を行うためのものである。The storage section 610 stores image information from the reading section 603, recognition results and progress from the character recognition section 604, judgment results from the unknown word judgment section 606, and the translation section 612.
Stores the translation results etc. Candidate words/phrases requiring correction section 60
Based on an instruction from an input unit (for example, a keyboard) 601 and based on the data stored in a storage unit 610 , a display unit 602 indicates candidate words that need to be corrected as explained in connection with the character recognition device described above. and identifier display section 609
Performed against. The input section 601 is used by the operator to correct the indicated candidate word/phrase requiring correction and to give instructions to other sections.

【００２１】本翻訳装置においては、文字認識部６０４
　、未知語判定部６０６　の処理を施したデータを翻訳
にかける前に、ユーザが表示部６０２の表示を見ながら
入力部６０１　を介して修正を行い、文字認識装置にお
ける認識誤りをなくした状態のデータを翻訳部６１２　
に送る（ａ）手動モードと、人手を介さずに読み取りか
ら翻訳までを通して行ってしまい、翻訳部６１２　の処
理の終了したデータを表示部６０２　に表示して、ユー
ザに修正させる（ｂ）自動モードとがある。[0021] In this translation device, the character recognition unit 604
, before the data processed by the unknown word determination unit 606 is translated, the user makes corrections via the input unit 601 while looking at the display on the display unit 602, thereby eliminating recognition errors in the character recognition device. Data translation unit 612
(a) Manual mode, in which the entire process from reading to translation is performed without human intervention, and the data that has been processed by the translation unit 612 is displayed on the display unit 602, allowing the user to make corrections (b) Automatic mode There is.

【００２２】表示部６０２　における修正必要候補語句
の指摘の例を図７に示す。図７（ａ）は手動モードの場
合（図１の文字認識装置の表示部２における表示と同じ
になる）、図７（ｂ）は自動モードの場合である。いず
れの場合にも、修正必要候補語句は順番に、例えば反転
表示されて示され、反転表示された語句に対応する識別
子は、同じ画面上のどこかに、曖昧性、未知語の少なく
とも一方のフラグが＋となる形で示される。この表示方
法は、他にも例えば、反転表示の色を変えて識別子とす
る等の方法でも良い。（ｂ）自動モードの場合は、文字
認識されたデータとその翻訳結果との両方が画面を区切
って表示される。図７（ａ）では、文字認識された原稿
中のｃｌａｉｍｓなる語について、パターン照合の際他
にも近い文字があったことを曖昧性＋で示し、ｃｌａｉ
ｍｓが辞書にある語であることを未知語−で示している
。図７（ｂ）では、文字認識された原稿（原文）中のｖａ
ｒｉｕｕｓなる語（もとの原稿ではｖａｒｉｏｕｓであ
った）について、曖昧性があり未知語であることを示し
ている。訳文中では未知語ｖａｒｉｕｕｓに対応する部
分が、例えば空白の反転表示等で示される。FIG. 7 shows an example of indication of candidate words requiring correction on the display section 602. FIG. 7(a) shows the case in manual mode (the display is the same as the display on the display unit 2 of the character recognition device in FIG. 1), and FIG. 7(b) shows the case in automatic mode. In either case, the candidate words that need to be corrected are shown in sequence, e.g. highlighted, and the identifier corresponding to the highlighted word is placed somewhere on the same screen to identify the ambiguity and/or unknown word. The flag is shown as +. This display method may be performed in other ways, such as changing the color of the inverted display and using it as an identifier. (b) In the automatic mode, both the character-recognized data and its translation results are displayed on separate screens. In Fig. 7(a), regarding the word claims in the manuscript whose characters have been recognized, the ambiguity + indicates that there were other similar characters during pattern matching.
The unknown word - indicates that ms is a word in the dictionary. In FIG. 7(b), va in the manuscript (original text) whose characters have been recognized is
This shows that the word riuus (in the original manuscript it was various) has ambiguity and is an unknown word. In the translated text, the portion corresponding to the unknown word variuus is indicated by, for example, a highlighted blank space.

【００２３】（ａ）手動モードの場合は、認識誤りのな
いデータが翻訳されるので、翻訳結果を修正する後編集
が簡単になるという効果がある。また、翻訳の前に、未
知語という識別子と共に出た語句の辞書への登録ができ
るので、翻訳の効率も良くなる。しかも、認識結果の段
階で修正する作業も、上述の文字認識装置で説明したよ
うに効率化される。（ｂ）自動モードの場合は、従来は
、翻訳できない原因が、（１）翻訳元のデータに読み取
り誤りがあった。（２）辞書に登録されていない未知語
であった、（３）（１）でも（２）でもないが文脈理解
等ができなかった、のように様々であるにも拘らず、あ
る語が翻訳できなかったときにオペレータがその原因を
区別することができなかった。本翻訳装置では、この問
題点を解消し、図７（ｂ）を例にすると、曖昧性の識別
子＋ならば原因（１）、未知語の識別子＋ならば原因（
２）、どちらでもないのに翻訳できていないならば原因
（３）、というように、翻訳できなかった原因をオペレ
ータが容易に推測することができるという効果がある。(a) In the manual mode, data without recognition errors is translated, which has the effect of simplifying post-editing to correct the translation results. Furthermore, since words appearing together with an unknown word identifier can be registered in the dictionary before translation, the efficiency of translation is improved. Moreover, the work of correcting the recognition result at the stage of recognition is also made more efficient as explained in connection with the above-mentioned character recognition device. (b) In the case of automatic mode, conventionally, the reason why translation was not possible was (1) a reading error in the translation source data. (2) It was an unknown word that was not registered in the dictionary, (3) It was neither (1) nor (2), but the context could not be understood. When translation failed, the operator was unable to distinguish the cause. This translation device solves this problem. Taking Figure 7(b) as an example, if the ambiguity identifier + is the cause (1), if the unknown word identifier + is the cause (
2), and if it is neither of these, but the translation has not been completed, it is the cause (3), which has the effect that the operator can easily guess the reason why the translation has not been completed.

【００２４】ところで、従来の文字認識装置は、文字認
識の専用システムとして独立した形態で提供されている
。このため、文字認識され例えば装置のディスク等に記
憶された文字コードを、例えば、翻訳する場合において
はフロッピーディスケット等に保存後、翻訳システムに
登録するようなことが行なわれている。このため、オペ
レータは、文字認識された文書のファイル名、フロッピ
ーディスケット等の保存方法、コード体系等の知識を習
得する必要があり、この操作は繁雑でオペレータに多大
な労力を強いる欠点があった。By the way, conventional character recognition devices are provided as independent systems dedicated to character recognition. For this reason, when translating a character code that has been recognized as a character and stored, for example, on a disk of the device, the code is stored on a floppy diskette or the like and then registered in the translation system. For this reason, operators are required to acquire knowledge such as the file names of documents whose characters have been recognized, how to save them on floppy diskettes, code systems, etc. This operation is complicated and requires a great deal of effort on the part of the operator. .

【００２５】又、文字認識した文字コード列が誤ってい
るのかのチェックは、文字認識装置があらかじめ備えて
いるスペルチェック用の辞書（上記の実施例では語彙辞
書７，６０７　）を用いて行なわれているが、次に希望
する処理を行うシステム、例えば、機械翻訳システムが
利用する辞書に認識された文字コード列があるかないか
等の判断をすることができないという問題点があった。[0025] Also, checking whether the character code string that has been recognized is incorrect is carried out using a spell check dictionary (the vocabulary dictionary 7,607 in the above embodiment) that is provided in advance in the character recognition device. However, there is a problem in that it is not possible to determine whether a recognized character code string exists in a dictionary used by a system that performs the next desired process, for example, a machine translation system.

【００２６】そこで、認識された文字コード列に対して
次に行いたい処理方法を指示するだけで次の処理システ
ムに文字コード列を渡すことができるようにする制御手
段を文字認識装置側に設けることで、オペレータの労力
を軽減できる文字認識システムを第２の実施例として説
明する。[0026] Therefore, a control means is provided on the character recognition device side so that the character code string can be passed to the next processing system simply by instructing the next processing method to be performed on the recognized character code string. A character recognition system that can reduce the operator's labor will be described as a second embodiment.

【００２７】図８は、第２の実施例に係る文字認識シス
テムの概略構成を示すブロック図である。図８において
、８１は入力部で、この入力部８１の入力データは、文
字認識制御部８３に送られる。入力部８１は、文字の入
力を始め、文字認識システムを制御するための各種コマ
ンドの入力を可能にしたもので、キーボード、マウス等
が使用される。例えば、イメージ入力部８２において原
稿の入力を開始する指示は、表示部８９に表示される“
イメージ読取り”ボタンをマウスで選択することで行う
。FIG. 8 is a block diagram showing a schematic configuration of a character recognition system according to the second embodiment. In FIG. 8, 81 is an input section, and input data of this input section 81 is sent to a character recognition control section 83. The input unit 81 is capable of inputting characters and various commands for controlling the character recognition system, and uses a keyboard, a mouse, etc. For example, an instruction to start inputting a document in the image input section 82 is displayed on the display section 89.
This is done by selecting the "Image reading" button with the mouse.

【００２８】文字認識制御部８３は、システム全体を制
御するためのもので、ここでは、入力部８１、イメージ
入力部８２、イメージ記憶部８４、文字認識部８５、文
字コード記憶部８６、文字コード編集部８７、次処理制
御部８８、及び表示部８９を制御するようになっている
。The character recognition control section 83 is for controlling the entire system, and here includes an input section 81, an image input section 82, an image storage section 84, a character recognition section 85, a character code storage section 86, and a character code storage section 83. It controls an editing section 87, a next processing control section 88, and a display section 89.

【００２９】イメージ入力部８２は、印刷、又は手書き
された原稿を１枚、又は複数枚セットされた状態から、
紙面のイメージを光学的に読み取る。読み取ったイメー
ジは、文字認識制御部８３へ送られ、圧縮等が施されて
イメージ記憶部８４へ記憶される。複数の頁がセットさ
れている場合は、順に最後の頁まで読み取る。又、認識
処理の途中でオペレータの要求で再読み取りや頁の挿入
等をする時にも、このイメージ入力部８２を用いる。イ
メージ記憶部８４は、読み取ったイメージを頁単位に記
憶する。文字認識部８５は、読み取ったイメージを文字
認識する。文字認識の技術については第１の実施例で説
明した通りである。文字コード記憶部８６は、文字認識
部８５で認識された文字コードを記憶する。このとき、
読み取った頁のイメージと対応付けして記憶する。The image input unit 82 inputs one or more printed or handwritten manuscripts from a set state.
Optically reads the image on the paper. The read image is sent to the character recognition control section 83, subjected to compression, etc., and stored in the image storage section 84. If multiple pages are set, read in order up to the last page. The image input unit 82 is also used when rereading or inserting pages at the operator's request during recognition processing. The image storage unit 84 stores the read images page by page. The character recognition unit 85 recognizes the characters of the read image. The character recognition technique is as described in the first embodiment. The character code storage unit 86 stores the character code recognized by the character recognition unit 85. At this time,
It is stored in association with the image of the page read.

【００３０】文字コード編集部８７は、文字認識部８５
で認識された文字コードを表示部８９に表示し、ユーザ
の指示により挿入、削除等の編集を行う。例えば、文字
認識部８５で認識された文字の中で、曖昧な文字につい
て検索することができる。このとき、この曖昧性のある
文字の近辺には、イメージ記憶部８４にある読み取った
イメージの該当する部分を取り出して表示する等が行わ
れる。文字コード編集部８７で編集が終了した文字コー
ド列は、再度、文字コード記憶部８６へ記憶される。表
示部８９は、文字読み取りイメージ等の情報を可視表示
可能なもの、例えばビットマップディスプレイ等が使用
される。The character code editing section 87 includes the character recognition section 85
The character code recognized in is displayed on the display section 89, and editing such as insertion or deletion is performed according to the user's instructions. For example, it is possible to search for ambiguous characters among the characters recognized by the character recognition unit 85. At this time, a corresponding portion of the read image stored in the image storage section 84 is extracted and displayed in the vicinity of the ambiguous character. The character code string that has been edited by the character code editing section 87 is stored in the character code storage section 86 again. The display section 89 uses a device that can visually display information such as character reading images, such as a bitmap display.

【００３１】次処理制御部８８は、文字認識、編集の終
った文字コード列を他の処理システムに渡す前に、渡す
先の処理システムでこの文字コード列に施したい処理内
容を、ユーザの指示により選択する。例えば、文字コー
ド列を翻訳システムに渡す場合、翻訳システムの文書と
して登録するだけでよいのか、登録するとともに翻訳を
するのか、又は翻訳後、その翻訳結果を印刷までするの
か等の処理方法を選択する。又、翻訳の場合は、翻訳で
利用する辞書の名前、翻訳環境等の情報を設定すること
ができる。[0031] Before passing the character code string that has undergone character recognition and editing to another processing system, the next processing control unit 88 receives instructions from the user regarding the processing to be performed on the character code string in the destination processing system. Select by. For example, when passing a character code string to a translation system, select the processing method, such as whether to simply register it as a document in the translation system, whether to register and translate it, or whether to print the translation result after translation. do. Furthermore, in the case of translation, information such as the name of the dictionary used for translation and the translation environment can be set.

【００３２】次にこのように構成した本システムの動作
を説明する。まず、図９は、本文字認識システムにおけ
る処理の流れを簡単に説明するための状態遷移図である
。図１０〜１９は、動作を説明するための画面例である
。図１０は文字認識システムの初期画面である。オペレー
タは、まず、読み取る文書の文書識別面を入力する。そ
の後、“イメージ読み取り”ボタンをマウスで選択する
と、イメージ入力部８２が動作を開始する（ステップ２
０１　）。Next, the operation of this system configured as described above will be explained. First, FIG. 9 is a state transition diagram for simply explaining the flow of processing in this character recognition system. 10 to 19 are screen examples for explaining the operation. FIG. 10 shows the initial screen of the character recognition system. The operator first inputs the document identification surface of the document to be read. Thereafter, when the "image reading" button is selected with the mouse, the image input section 82 starts operating (step 2).
01).

【００３３】イメージ入力部８２が動作中は図１１に示
す読み取りイメージが順に画面に表示される。複数の頁
がイメージ入力部８２にあるときは、頁のある限り読み
取りを繰り返し実行する（ステップ２０２　）。イメー
ジ読み取りが終了すると図１０に戻る（ステップ２０３
　）。While the image input unit 82 is in operation, the read images shown in FIG. 11 are sequentially displayed on the screen. When a plurality of pages are present in the image input section 82, reading is repeatedly executed as long as there are pages (step 202). When image reading is completed, the process returns to FIG. 10 (step 203
).

【００３４】初期画面に戻った状態では、文書識別名を
変更して別の原稿のイメージ入力をすることもできる。また、文書識別名を変更せずに続けて次の原稿の入力が
要求された場合は、既に読み取った原稿に追加するのか
、既に読み取った原稿をキャンセルして新たに入力する
のかをオペレータに質問をする。[0034] When the initial screen is returned, the document identification name can be changed and an image of another document can be input. Also, if you are requested to input the next document without changing the document identification name, ask the operator if you want to add to the already scanned document or cancel the already scanned document and input a new document. do.

【００３５】イメージ入力終了後、そのイメージの文字
認識を実行する場合は、“認識”ボタンをマウスで選択
する（ステップ２０４　）。本実施例の文字認識システ
ムでは、文字認識を開始する前に原稿のレイアウトを認
識する。認識したレイアウトに誤りがないかオペレータ
に提示する。図１２にレイアウト認識の結果をオペレー
タに知らせる画面例を示す。この画面において、誤って
認識されたブロックの順番は、“入れ替え”ボタンを使
って変更する。また、“全指定”ボタンで順に指定する
こともできる。その他、文字認識をする範囲を指定する
こともできる。また認識する必要のない部分は“削除”
ボタンで削除することもできる。例えば、機械翻訳する
必要のない頁番号、ヘッダ、フッタ等を削除したり、認
識する範囲から外すことができる。After inputting the image, if character recognition of the image is to be executed, the "recognition" button is selected with the mouse (step 204). The character recognition system of this embodiment recognizes the layout of the document before starting character recognition. Indicate to the operator whether there are any errors in the recognized layout. FIG. 12 shows an example of a screen that informs the operator of the results of layout recognition. On this screen, use the "Replace" button to change the order of blocks that were incorrectly recognized. You can also specify them one by one using the "Specify All" button. In addition, you can also specify the range for character recognition. Also, “delete” parts that do not need to be recognized.
You can also delete it with the button. For example, page numbers, headers, footers, etc. that do not require machine translation can be deleted or removed from the recognition range.

【００３６】また、読み取ったイメージが鮮明でないと
きには、“再読み取り”ボタンを選択する（ステップ２
０５　）。この操作でイメージ入力部８２から読み取る
状態（濃淡、コントラスト、原稿位置等）を変更したイ
メージを再入力することができる。また、イメージ入力
の段階で抜けてしまった頁は“頁挿入”ボタンを選択す
ることで簡単に頁を挿入することができる。Furthermore, if the read image is not clear, select the "Reread" button (step 2).
05). With this operation, it is possible to re-input an image with changed reading conditions (shading, contrast, document position, etc.) from the image input section 82. Furthermore, pages that are omitted during the image input stage can be easily inserted by selecting the "insert page" button.

【００３７】図１２で文字認識開始ボタンを選択すると
認識したレイアウトの順にしたがって、文字認識の実行
を開始する。認識中は図１３の右側にどの位置を認識中
かが表示される（ステップ３０６　）。また、合わせて
、認識した文字コードが画面の右側に順に表示される。文字認識が終了すると、図１４が表示される。このとき
、読み取る原稿の濃度の指定を誤ったり、原稿を傾いて
置いたりすると、誤って認識される文字が多くなる。こ
のときは、“再読み取り”ボタンを選択することで再度
、該当する頁の原稿を読み直すことができる（ステップ
２０７　）。When the character recognition start button is selected in FIG. 12, character recognition starts in accordance with the recognized layout order. During recognition, the position being recognized is displayed on the right side of FIG. 13 (step 306). Additionally, the recognized character codes are displayed in sequence on the right side of the screen. When character recognition is completed, FIG. 14 is displayed. At this time, if the density of the original to be read is incorrectly specified or the original is placed at an angle, many characters will be erroneously recognized. In this case, by selecting the "Reread" button, the user can reread the document on the corresponding page (step 207).

【００３８】“次候補”ボタンを選択するとカーソル位
置から文字認識において曖昧性のある文字を検索する（
ステップ２０８　）。曖昧性のある文字が見付かると図
１５のように曖昧性のある文字の近辺にイメージが表示
される。オペレータは、そのイメージを参照しながら文
字の修正を行う。ここで、文字認識部８５において未知
語判定をも行うとすると、第１の実施例で説明した、曖
昧性のある文字と未知語とをそれぞれの識別子を付けて
表示するという機能を付加することができる。Selecting the "Next Candidate" button searches for ambiguous characters in character recognition from the cursor position (
Step 208). When an ambiguous character is found, an image is displayed near the ambiguous character as shown in FIG. The operator corrects the characters while referring to the image. Here, if the character recognition unit 85 also performs unknown word determination, it is necessary to add the function of displaying ambiguous characters and unknown words with their respective identifiers, as described in the first embodiment. Can be done.

【００３９】また、ここで翻訳システムが利用している
辞書に存在しない文字コード列についても提示すること
ができる。即ち、上記の未知語判定に用いる辞書と翻訳
システムが用いる辞書とで、収録されている語彙を一致
させておき、上記の未知語の識別子の付いた語句を提示
する。ここで、文字コード列に対する翻訳されるべき訳
語を入力することで、文字コード列とその訳語を翻訳シ
ステムで利用する辞書に登録する。さらに、この文字コ
ード列を未知語判定に用いる辞書にも登録する。ただし
、この機能は翻訳システムでは有効であるが、例えば、
他の計算機に送信するときは不要となる。Furthermore, character code strings that do not exist in the dictionary used by the translation system can also be presented. That is, the vocabulary used in the above-mentioned unknown word determination and the dictionary used by the translation system are made to match, and phrases with the above-mentioned unknown word identifiers are presented. Here, by inputting the translation word to be translated for the character code string, the character code string and its translation word are registered in the dictionary used by the translation system. Furthermore, this character code string is also registered in a dictionary used for unknown word determination. However, while this feature is useful in translation systems, e.g.
It is not necessary when sending to another computer.

【００４０】１頁分の編集が終了したら“次頁”ボタン
を選択する（ステップ２０９　）。続く頁イメージがあ
るときは、次の頁のレイアウト認識が開始される。続く
頁がないときは図１０に戻る。このときは、全ての頁の
文字認識が終了しているので“イメージ入力”、“認識
”ボタンは網かけされた状態で表示される。これにより
、文書識別名で示される読み取った原稿には次の処理要
求（実施例では翻訳の要求）が残っていることがわかる
。また、例えば、５頁のイメージを読み取った状態で、
３頁まで文字認識、編集を実行し、途中で中断したとき
は、“イメージ入力”ボタンだけが網かけ状態で表示さ
れ、“認識”、“翻訳”ボタンは網かけとならず、まだ
認識を必要とするイメージが残っていることが示される
。また、認識を必要とするイメージが残っている状態で
“翻訳”ボタンを選択すると、図１６の画面が表示され
オペレータに警告が発せられる。When editing for one page is completed, the "next page" button is selected (step 209). If there is a subsequent page image, layout recognition for the next page is started. If there is no subsequent page, return to FIG. 10. At this time, since character recognition for all pages has been completed, the "image input" and "recognition" buttons are displayed in a shaded state. This indicates that the next processing request (translation request in this embodiment) remains for the read manuscript indicated by the document identification name. Also, for example, when reading the image on page 5,
If character recognition and editing are executed up to page 3 and interrupted midway, only the "Image input" button will be displayed in a shaded state, and the "Recognition" and "Translation" buttons will not be shaded, indicating that recognition has not yet been completed. This will show you that the images you need remain. Furthermore, if the "Translate" button is selected while an image that requires recognition remains, the screen shown in FIG. 16 is displayed and a warning is issued to the operator.

【００４１】“翻訳”ボタンを選択すると図１７が表示
される（ステップ３１１　）。この画面では、オペレー
タが認識されたデータを渡す処理システムで次に行って
ほしい処理内容を選択する。実施例では、（１）翻訳シ
ステムに登録する、（２）登録後、翻訳する、または（
３）翻訳後、印刷するのどれかを番号で選択する。When the "Translation" button is selected, FIG. 17 is displayed (step 311). On this screen, the operator selects the next processing content that the operator wants the processing system to pass the recognized data to. In the embodiment, (1) registering in a translation system, (2) translating after registration, or (
3) After translation, select which one to print by number.

【００４２】処理内容が選択され、“確認”ボタンが選
択されると、文書識別名で示される複数の頁からなる文
字コード列を１つにまとめて施したい処理内容とともに
翻訳システムに連絡する（ステップ２１２　）。その後
、初期画面に戻る。この図１７の画面が表示されている
間に次処理制御部８８が行っている動作を以下に説明す
る。[0042] When the processing content is selected and the "Confirm" button is selected, the character code string consisting of multiple pages indicated by the document identifier is combined into one and the processing content to be performed is notified to the translation system ( step 212). Then, return to the initial screen. The operations performed by the next processing control unit 88 while the screen of FIG. 17 is displayed will be described below.

【００４３】図２０は、文字認識が終り、文字コード記
憶部８６の文書に対して、次に希望する処理方法が選択
され、頁単位で記憶される文字コード列がまとめられて
次の処理に渡される場合のデータの流れを示す。実施例
では、図１７の画面で処理番号が選択され、“確認”ボ
タンが選択されると、この処理が開始される。図２１は
、上記、“確認”ボタンが選択された場合の動作を説明
するためのフローチャートである。ステップ７０１　で
は、処理番号が選択されているか判定され、選択されて
いない場合は、オペレータに知らされる。次に、選択さ
れた処理方法が処理制御データに出力される（ステップ
７０２　）。実施例では、文書識別名の後に識別子“ｃｔｌ”を付け
て示す。FIG. 20 shows that after character recognition is completed, the next desired processing method is selected for the document in the character code storage section 86, and the character code strings stored on a page-by-page basis are grouped together for the next processing. Shows the flow of data when passed. In the embodiment, when a process number is selected on the screen shown in FIG. 17 and the "Confirm" button is selected, this process is started. FIG. 21 is a flowchart for explaining the operation when the above-mentioned "Confirm" button is selected. In step 701, it is determined whether a process number has been selected, and if not, the operator is notified. Next, the selected processing method is output as processing control data (step 702). In the embodiment, an identifier "ctl" is added after the document identification name.

【００４４】続いて、文字コード記憶部８６に記憶され
る認識された頁単位の文字コード列を頁番号順に取出し
、１つにまとまったテキストデータに出力する（ステッ
プ７０３）。実施例では、文書識別名の後に識別子“ｔ
ｘｔ”を付けて示す。Next, the recognized character code strings for each page stored in the character code storage section 86 are retrieved in order of page number and output as a single set of text data (step 703). In the embodiment, the document identifier is followed by the identifier "t".
xt".

【００４５】この出力された処理制御データとテキスト
データを次に希望する処理とあらかじめ決められた領域
に移動する（ステップ７０４　〜７０５　）。この次に
希望する処理とデータの受渡しのための領域として、あ
らかじめ決める領域は、例えば、ディスク等の共通する
領域等を利用する。また、この共通する領域は、システ
ムの中の領域でなく、ネットワーク等で接続された他の
計算機の領域を利用することもできる。The output processing control data and text data are moved to the next desired processing and predetermined area (steps 704 to 705). As an area for the next desired processing and data transfer, a common area of a disk or the like is used as a predetermined area, for example. Moreover, this common area is not an area within the system, but an area of another computer connected via a network or the like may be used.

【００４６】次に希望する処理を行う他の処理システム
側では、あらかじめ決めた領域に処理制御データとテキ
ストデータが文字認識システムから出力されるのを待つ
（ステップ７０６　）。あらかじめ決められた領域に処
理制御データが出力されると、この処理制御データに指
示される処理を開始する。又は、一定間隔で上記データ
が出力されるか判定する等で処理を開始する（ステップ
７０７　〜７１１　）。Next, the other processing system that performs the desired processing waits for processing control data and text data to be output from the character recognition system to a predetermined area (step 706). When the processing control data is output to a predetermined area, the processing instructed by the processing control data is started. Alternatively, the process is started by determining whether the above data is output at regular intervals (steps 707 to 711).

【００４７】例えば、機械翻訳においては、処理制御デ
ータの処理方法と翻訳で利用する辞書の種類等を取出し
、文字認識システムで入力されたテキストデータを翻訳
システムに文書登録、あるいは文書登録と翻訳、あるい
は翻訳された訳文の印刷までを自動的に行う。ここで、
図９の説明に戻る。For example, in machine translation, the processing method of processing control data and the type of dictionary used for translation are determined, and the text data input by the character recognition system is registered in the translation system, or the document registration and translation is performed. Or even automatically print the translated text. here,
Returning to the explanation of FIG. 9.

【００４８】読み取った文書の状態を知るために文書の
一覧表を表示する機能がある（ステップ２１３　）。図
１８に文書一覧表の画面例を示す。読み取られた文書の
文書名が、読み取られた頁数及びその内認識の終了した
頁数とともに表示されている。この文書一覧表の表示状
態において文書を選択すると文書識別名を入力する必要
がない（ステップ２１４　）。また、“削除”ボタンを
選択すると不要な文書を簡単に消すことができる（ステ
ップ２１５　）。There is a function to display a list of documents in order to know the status of the documents that have been read (step 213). FIG. 18 shows an example screen of the document list. The document name of the read document is displayed together with the number of pages read and the number of pages for which recognition has been completed. If a document is selected in the display state of this document list, there is no need to input a document identification name (step 214). Additionally, by selecting the "delete" button, unnecessary documents can be easily deleted (step 215).

【００４９】この他、翻訳システムに認識した文字コー
ド列を送って翻訳する際に利用する辞書は、“辞書選択
”、“翻訳環境設定”ボタンを選択することで簡単にで
きる（ステップ２１６　）。図１９に辞書の選択の画面
例を示す。以上の読み取り、文字認識、編集、次処理制
御においては、頁単位に情報を管理しているが、これを
行う文字認識制御部８３の動作を以下に説明する。図２
２は、イメージ入力部８２で読取った複数頁のイメージ
を記憶するイメージ記憶部８４と文字コード記憶部８６
の関係を示す図である。In addition, the dictionary to be used when sending the recognized character code string to the translation system for translation can be easily selected by selecting the "Dictionary Selection" and "Translation Environment Setting" buttons (step 216). FIG. 19 shows an example of a dictionary selection screen. In the reading, character recognition, editing, and next processing control described above, information is managed on a page-by-page basis, and the operation of the character recognition control section 83 that performs this will be described below. Figure 2
2 is an image storage unit 84 that stores images of multiple pages read by the image input unit 82, and a character code storage unit 86.
FIG.

【００５０】図２２は、図１０に示す初期画面で入力さ
れる文書識別名を利用して、データの対応を取った例で
ある。“ｒｏｕｂｕｎ”と“ｍａｎｕａｌ”の２つの文
書識別名を持つ原稿がイメージ記憶部８４から入力され
、“ｒｏｎｂｕｎ”の３頁分が文字認識されている状態
を示している。FIG. 22 is an example of data correspondence using the document identification name input on the initial screen shown in FIG. A document having two document identification names "roubun" and "manual" is input from the image storage unit 84, and three pages of "ronbun" are being recognized as characters.

【００５１】実施例のデータの名前は、文書識別名の後
に読取った順に３桁のシーケンス番号を付加し、さらに
、その後にデータの識別子として“ｒｆ”、“ｔｘｔ”
等を付加して示す。図２２において、“ｒｆ”の識別子
の付いているデータ、“ｒｏｕｎｂｕｎ００４〜００５
”と“ｐａｔｅｎｔ００１〜００４”は読取ったイメー
ジが圧縮した状態でそのまま記載されていることを示す
。また、“ｒｏｎｂｕｎ００１〜００３”は、既に文字
認識され文字コードとして文字コード記憶部に記憶され
ているのでイメージ記憶部８４から消されている。実施
例では、既に文字認識された頁イメージは消去する例で
示したが全て頁の認識が終るまでイメージ記憶部８４に
記憶しても良い。また、文字コード記憶部８６の“ｔｘ
ｔ”の識別子の付いているデータ、“ｒｏｎｂｕｎ００
１〜００３”は文字認識が終っていることを示している
。この他、“ｒｏｕｎｂｕｎ．ｃｔｌ”で示されるデー
タは、図１７、図１９で選択された、次に希望する処理
方法、辞書の種類、原文識別名等を記憶する。[0051] The data name in the example is given by adding a 3-digit sequence number in the order of reading after the document identification name, and then adding "rf" and "txt" as data identifiers.
etc. are added and shown. In FIG. 22, data with the identifier “rf” and “rounbun004 to 005
” and “patent001-004” indicate that the read image is written as is in a compressed state.Furthermore, “ronbun001-003” has already been recognized as a character and is stored in the character code storage unit as a character code. Therefore, it is erased from the image storage unit 84. In the embodiment, page images whose characters have already been recognized are erased, but they may be stored in the image storage unit 84 until recognition of all pages is completed. "tx" in the character code storage section 86
Data with the identifier “t”, “ronbun00
1 to 003" indicates that character recognition has been completed. In addition, "rounbun. ctl'' stores the next desired processing method, dictionary type, original text identification name, etc. selected in FIGS. 17 and 19.

【００５２】図２３は、図２２に示す状態で記憶される
読取りイメージに対して実施される再読取り、頁挿入、
頁削除の動作を説明するためのフローチャートである。この場合、文字認識制御部８３は、入力部８１の処理要
求が頁に関する処理であるか判定する（ステップ１０１
　〜１０３　）。頁削除、再読取り、頁挿入以外の処理要求は、他のファ
ンクション処理をする（ステップ１０４　）。FIG. 23 shows rereading, page insertion, and page insertion performed on the read image stored in the state shown in FIG.
It is a flowchart for explaining the operation of page deletion. In this case, the character recognition control unit 83 determines whether the processing request from the input unit 81 is a page-related process (step 101
~103). Processing requests other than page deletion, rereading, and page insertion are processed by other functions (step 104).

【００５３】ステップ１０３　でオペレータの要求が頁
挿入であると判断した場合は、まず、イメージ記憶部８
４に頁単位で付けられているデータ識別名を取出すため
の比較カウンターをクリアする（ステップ１０５　）。続いて、イメージ記憶部８４から比較カウンターが示す
データ識別名を取出し、文書識別名（ｉ）と頁番号（ｉ
）を抽出する（ステップ１０６　）。If it is determined in step 103 that the operator's request is to insert a page, first the image storage unit 8 is
A comparison counter for extracting the data identification name assigned to page 4 on a page-by-page basis is cleared (step 105). Next, the data identification name indicated by the comparison counter is retrieved from the image storage unit 84, and the document identification name (i) and page number (i
) (step 106).

【００５４】次に、ステップ１０６　で抽出された、文
書識別名（ｉ）と処理中の文書識別名と比較する（ステ
ップ１０７　）。一致しない場合は、別の文書識別名で
あるから比較カウンタを１つ増加する（ステップ１１０
　）。文書識別名が一致する場合は、頁番号（ｉ）と処
理中の頁番号と比較する（ステップ１０８　）。Next, the document identification name (i) extracted in step 106 is compared with the document identification name being processed (step 107). If they do not match, the comparison counter is incremented by one because the document identifiers are different (step 110).
). If the document identification names match, the page number (i) is compared with the page number being processed (step 108).

【００５５】頁番号（ｉ）が処理中の頁、又は大きい場
合は、データ識別名の頁番号を１つ増加する（ステップ
１０９　）。その後、比較カウンターを増加して、まだ
イメージ記憶部８４に別のデータ識別名があるか判定し
、別のデータ識別名がある場合は、ステップ１０６　に
戻る（ステップ１１０　〜１１１　）。If the page number (i) is the page being processed or is large, the page number of the data identification name is incremented by one (step 109). Thereafter, the comparison counter is incremented to determine whether there is another data identification name in the image storage section 84. If there is another data identification name, the process returns to step 106 (steps 110 to 111).

【００５６】例えば、図２２において、処理中の文書識
別名が“ｒｏｎｂｕｎ”の場合に、頁挿入の要求をする
と、３頁までは文字認識が終っているので処理中の頁は
４頁が該当する。このため、データ識別名は、“ｒｏｎ
ｂｕｎ．００４．ｒｆ”→“ｒｏｎｂｕｎ．００５．ｒ
ｆ”、“ｒｏｎｂｕｎ．００５．ｒｆ”→“ｒｏｎｂｕ
ｎ．００６．ｒｆ”に変更される。For example, in FIG. 22, if the document identifier being processed is "ronbun" and a page insertion request is made, character recognition has been completed for up to page 3, so page 4 is the corresponding page. do. Therefore, the data identifier is “ron
Bun. 004. rf” → “ronbun. 005. r
f”, “ronbun. 005. rf” → “ronbu
n. 006. rf”.

【００５７】該当するデータ識別名の変更が終ると、イ
メージ入力部８２から処理中のデータ識別名が示す領域
に挿入する頁のイメージを読取り、次の処理（実施例で
は、レイアウト認識）を開始する（ステップ１１２　）
。ステップ１０２　でオペレータの要求が再読取りであ
ると判断した場合は、データ識別名の変更をする必要が
ないので、ステップ１１２　を実行する。When the change of the corresponding data identification name is completed, the image of the page to be inserted into the area indicated by the data identification name being processed is read from the image input unit 82, and the next process (layout recognition in the embodiment) is started. (step 112)
. If it is determined in step 102 that the operator's request is for re-reading, there is no need to change the data identification name, so step 112 is executed.

【００５８】ステップ１０１　でオペレータの要求が頁
削除であると判断した場合は、まず、イメージ記憶部８
４から処理中のデータ識別名と該当する読取りイメージ
を削除する（ステップ１１３　）。If it is determined in step 101 that the operator's request is to delete a page, first, the image storage unit 8
4, the data identification name being processed and the corresponding read image are deleted (step 113).

【００５９】続いて、頁挿入と同様にデータ識別名を変
更する。頁削除の場合は、データ識別名の頁番号を１つ
減少する（ステップ１１８　）。その他のステップ１１
４　〜１２０　は頁挿入のステップ１０５　〜１１１と
同一であるので説明を省略する。Next, the data identification name is changed in the same way as when inserting a page. In the case of page deletion, the page number of the data identification name is decreased by one (step 118). Other steps 11
Steps 4 to 120 are the same as steps 105 to 111 for inserting pages, so their explanation will be omitted.

【００６０】以上説明した第２の実施例の主旨は、印刷
、又は手書きされた原稿のイメージを読み取る手段と、
読み取ったイメージを文字認識する手段と、文字認識す
ることでコード化された文字を編集する手段と、編集さ
れた文字コード列に対して次に施すべき処理内容を指示
する手段を具備したことを特徴とする文字認識システム
であり、また、複数頁の印刷、又は手書きされた原稿の
イメージを読み取る手段と、読み取った複数頁のイメー
ジを頁単位で記憶する手段と、読み取り手段において頁
単位に再読み取り、頁挿入、又は頁削除する手段と、記
憶された複数頁のイメージを頁単位に文字認識、編集す
る手段と、記憶された全ての頁の文字認識、編集が終了
後に、複数頁からなる文字コード列を１　つにまとめる
手段とを具備したことを特徴とする文字認識システムで
ある。The gist of the second embodiment described above is to provide a means for reading an image of a printed or handwritten manuscript;
The present invention is equipped with a means for character recognition of a read image, a means for editing characters encoded by character recognition, and a means for instructing the contents of processing to be performed next on the edited character code string. It is a character recognition system that is characterized by a means for reading images of multiple pages of printed or handwritten manuscripts, a means for storing the read images of multiple pages page by page, and a means for reproducing page by page in the reading means. means for reading, inserting pages, or deleting pages; means for character recognition and editing of stored images of multiple pages on a page-by-page basis; The present invention is a character recognition system characterized by comprising means for combining character code strings into one.

【００６１】したがって、このように構成された文字認
識システムによれば、複数の頁のイメージを順に読み取
って文字認識、編集の処理を行う際、頁を意識すること
なく実行でき、全て読み取った頁イメージの文字認識、
編集が実行された後、その認識、編集された文字コード
列を１つにまとめ、そのまとめた文字コード列に対して
、次の処理システムで行いたい処理内容を指示すること
で、ファイル名、コード体系等を意識せずに、次の処理
を実行することができオペレータの労力を大幅に軽減で
きる。[0061] Therefore, according to the character recognition system configured in this way, when images of a plurality of pages are sequentially read and character recognition and editing processing is performed, it is possible to perform character recognition and editing without being aware of the pages. image character recognition,
After the editing is executed, the recognized and edited character code strings are combined into one, and the file name, The next process can be executed without being aware of the code system, etc., and the labor of the operator can be significantly reduced.

【００６２】なお、上述の実施例では認識した文字コー
ド列の処理を翻訳する例に対応させたが、他の計算機に
送信させてもよい。さらに、レイアウト情報をＤＴＰ文
書の形式に変換させるようにしてもよい。またイメージ
の入力をスキャナでなく、ファックスで送信されたデー
タを入力としてもよい。ファックスから入力されたイメ
ージを文字認識し、その認識された文字コード列を翻訳
システムで翻訳し、さらに印刷まで行うことができる。[0062] In the above embodiment, the processing of the recognized character code string corresponds to an example of translation, but it may be transmitted to another computer. Furthermore, the layout information may be converted into a DTP document format. Furthermore, instead of inputting the image using a scanner, data transmitted by fax may be input. It is possible to perform character recognition on images input from a fax machine, translate the recognized character code string using a translation system, and even print the image.

【００６３】[0063]

【発明の効果】以上説明したように、本発明に係る文字
認識装置によれば、認識誤り箇所の候補がいかなる理由
によって指摘されているのであるかをオペレータに示し
、また、指摘するために必要な条件のを設定を行うこと
ができるため、文字認識の後編集に於けるオペレータの
負担を軽減し、作業効率を向上させることができる等の
実用上多大なる効果が奏せられる。[Effects of the Invention] As explained above, according to the character recognition device according to the present invention, it is possible to indicate to an operator the reason for pointing out a recognition error candidate, and also to Since it is possible to set various conditions, great practical effects can be achieved, such as reducing the burden on the operator in editing after character recognition and improving work efficiency.

[Brief explanation of drawings]

【図１】　　本発明の一実施例に係る文字認識装置の概
略構成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of a character recognition device according to an embodiment of the present invention.

【図２】　　本発明の一実施例に係る文字認識装置の動
作の概要を示すフローチャート。FIG. 2 is a flowchart showing an overview of the operation of a character recognition device according to an embodiment of the present invention.

【図３】　　図２における修正必要候補語句の指摘とそ
の語句に対する識別子の表示を行うステップの動作の一
例を表すフローチャート。3 is a flowchart illustrating an example of the operation of the step of pointing out a candidate word/phrase requiring correction and displaying an identifier for the word/phrase in FIG. 2;

【図４】　　図２における修正必要候補語句の指摘とそ
の語句に対する識別子の表示を行うステップ以降の動作
の別の例を表すフローチャート。4 is a flowchart illustrating another example of the operation after the step of pointing out a candidate word/phrase requiring correction and displaying an identifier for the word/phrase in FIG. 2;

【図５】　　図２における修正必要候補語句の指摘とそ
の語句に対する識別子の表示を行うステップの動作の別
の例を表すフローチャート。5 is a flowchart illustrating another example of the operation of the step of pointing out a candidate word/phrase requiring correction and displaying an identifier for the word/phrase in FIG. 2;

【図６】　　本発明に係る文字認識装置を用いた翻訳装
置の構成例を示すブロック図。FIG. 6 is a block diagram showing a configuration example of a translation device using the character recognition device according to the present invention.

【図７】　　（ａ）図１における表示部２、（ｂ）図６
における表示部６０２　、の画面の例を示す図。[Figure 7] (a) Display section 2 in Figure 1, (b) Figure 6
The figure which shows the example of the screen of the display part 602 in.

【図８】　　第２の実施例に係る文字認識システムの構
成を示すブロック図。FIG. 8 is a block diagram showing the configuration of a character recognition system according to a second embodiment.

【図９】　　第２の実施例の処理の流れを説明するため
の状態遷移図。FIG. 9 is a state transition diagram for explaining the flow of processing in the second embodiment.

【図１０】　　本文字認識システムの初期画面図。[Fig. 10] An initial screen diagram of this character recognition system.

【図１１】　　本文字認識システムのイメージ入力中の
画面図。FIG. 11 is a screen diagram of the present character recognition system during image input.

【図１２】　　本文字認識システムのレイアウト認識結
果を示す画面図。FIG. 12 is a screen diagram showing layout recognition results of the present character recognition system.

【図１３】　　本文字認識システムの文字認識中の画面
図。FIG. 13 is a screen diagram of the present character recognition system during character recognition.

【図１４】　　本文字認識システムの文字認識結果を示
す画面図。FIG. 14 is a screen diagram showing character recognition results of the present character recognition system.

【図１５】　　曖昧性のある文字を検索する際の画面図
。FIG. 15 is a screen diagram when searching for ambiguous characters.

【図１６】　　全頁の認識が終了していない時点での次
の処理（翻訳）の要求に対する警告を発する際の画面図
。FIG. 16 is a screen diagram when a warning is issued for a request for the next process (translation) when recognition of all pages has not been completed.

【図１７】　　オペレータに次に行いたい処理内容を選
択させる際の画面図。FIG. 17 is a screen diagram when the operator is asked to select the next processing content.

【図１８】　　読み取った文書の一覧を表示する際の画
面図。FIG. 18 is a screen diagram when displaying a list of scanned documents.

【図１９】　　翻訳のときに利用する辞書をオペレータ
に選択させる際の画面図。FIG. 19 is a screen diagram when the operator is asked to select a dictionary to be used during translation.

【図２０】　　認識の終了したデータが次の処理に渡さ
れるときのデータの流れを示す図。FIG. 20 is a diagram showing the flow of data when recognized data is passed to the next process.

【図２１】　　認識の終了したデータに対して次の処理
システムで行う処理内容を指示する次処理制御部８８の
動作を表すフロー図。FIG. 21 is a flow diagram showing the operation of the next processing control unit 88 that instructs the next processing system to perform processing on data that has been recognized.

【図２２】　　複数頁のイメージを記憶するイメージ記
憶部８４と文字コード記憶部８６の関係を示す図。FIG. 22 is a diagram showing the relationship between an image storage section 84 that stores images of multiple pages and a character code storage section 86.

【図２３】　　読み取り、文字認識、記憶、編集等を頁
単位で行うことに関する文字認識制御部８３の動作を表
すフロー図。FIG. 23 is a flow diagram showing the operation of the character recognition control unit 83 regarding reading, character recognition, storage, editing, etc. on a page-by-page basis.

[Explanation of symbols]

１，６０１　…入力部２，６０２　…表示部３，６０３　…読み取り部４，６０４　…文字認識部５，６０５　…認識用辞書６，６０６　…未知語判定部７，６０７　…語彙辞書８，６０８　…修正必要候補語句指示部９，６０９　…
識別子表示部１０，６１０　…記憶部１１，６１１　…制御部１２…イメージファイル６１２　…翻訳部６１３　…翻訳用辞書８１…入力部８２…イメージ入力部８３…文字認識制御部８４…イメージ記憶部８５…文字認識部８６…文字コード記憶部８７…文字コード編集部８８…次処理制御部８９…表示部1,601...Input unit 2,602...Display unit 3,603...Reading unit 4,604...Character recognition unit 5,605...Recognition dictionary 6,606...Unknown word determination unit 7,607...Vocabulary dictionary 8,608... Candidate words/phrases requiring correction section 9,609...
Identifier display unit 10, 610... Storage unit 11, 611... Control unit 12... Image file 612... Translation unit 613... Translation dictionary 81... Input unit 82... Image input unit 83... Character recognition control unit 84... Image storage unit 85... Character recognition section 86...Character code storage section 87...Character code editing section 88...Next processing control section 89...Display section

Claims

[Claims]

Claim 1: An input means for inputting characters as an image pattern, and an image pattern inputted by the input means is compared with a character pattern in a first dictionary stored in advance, and the degree of similarity is equal to or higher than a predetermined value. a character recognition means for selecting characters of the character pattern, and determining whether a character string consisting of the characters obtained by the character recognition means exists in a character string of a second dictionary stored in advance. In a character recognition device comprising a determination means and a display means for displaying characters obtained by the character recognition means, when displaying characters by the display means, a character string consisting of the displayed characters is When the character recognition means includes a character selected from a plurality of character patterns with the degree of similarity equal to or higher than a predetermined value, and when the determination means determines that the character does not exist in the character string of the second dictionary. A character recognition device characterized by distinguishing and displaying.