JPS5848180A

JPS5848180A - Character decision processing system

Info

Publication number: JPS5848180A
Application number: JP56145897A
Authority: JP
Inventors: Yukiyasu Iida; 飯田　行恭; Shunkichi Tada; 多田　俊吉; Yasuhiro Yamada; 山田　康宏; Kazuaki Komori; 小森　和昭
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1981-09-16
Filing date: 1981-09-16
Publication date: 1983-03-22

Abstract

PURPOSE:To read a character string, which contains different kind of characters whose forms are similar, with high precision by previously inputting information on whether two character strings are adjacent to each other or not through a form, and then counting the frequencies of those two character strings during reading operation. CONSTITUTION:Information on whether two character strings are adjacent to each other or not is inputted previously through a form, and during reading operation, the frequencies of those two character strings are counted to select a possible character for an input character. For example, the form is scanned through a scanning part 20 under the command of a control part 10, and the picture signal of the input character obtained by photoelectric conversion is transferred to a recognition part 30 to perform recognition processing, thereby sending the code of the candidate character for the input character to the control part 10. Then when the input form has underfined front and back relation, the readout result is written in a front and back relation table storage part 60 through an interpretation part 50 for the form having underfined front and back relation. When the input document is a form to be read, the candidate character is sent to a decision part 70, whose decision result is sent out through an output line 72; and a character string frequency counting part 80 counts the frequencies of those two character strings.

Description

【発明の詳細な説明】本発明は、文字判定処理方式、特にカナ、英字、数字等
の複数の字種が搗在して書かれてぃ′る帳票を読取る文
字読取装置における文字判定処理方式に関す石ものであ
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character determination processing method, particularly a character determination processing method in a character reading device that reads a form in which a plurality of character types such as kana, alphabetic characters, and numbers are written. It is a stone related to.

カナ、英字、数字等の異なり九字種の文字を混在させた
場合、互いｋよく似た文字の組、例えば、カナの１工１
と英字の１１１、数字の１７１とカナの１り１等が多く
存在する。異字種が混ｉする文字列を読取る場合、従来
の文字読取装置では、これらの類似文字間の差を強調し
て書かせ、例えば、数字の０７″の場合は右の線分が左
側に凸に湾曲すゐように、またカナの・り６の場合は右
側に凸に湾曲するように書かせる等によシ、読取精度を
確保していた。If you mix different types of characters such as kana, alphabets, numbers, etc., a set of characters that are very similar to each other, for example, kana 1 k 1
There are many alphabetic characters, 111, numbers 171, and kana characters such as 1 and 1. When reading a character string containing a mixture of different characters, conventional character reading devices emphasize the differences between these similar characters. For example, in the case of the number 07'', the right line segment is convex to the left. The reading accuracy was ensured by writing the letters in a curved manner, and in the case of kana no ri 6, making them curve convexly to the right.

しかし、この方法では、筆記者への心理的負担が増える
と同時に、認識処理において、微細な字形の差を調べな
ければならなｈため、認識論理回路の規模が大きくなシ
、゛装置が高価なものになるといった欠点があった。こ
れ忙対し、読取対象単語の辞書を予め用意しておき、読
取結果と辞書とを比較し、読取結果を修正することで読
取精度を上げる方法などが提案されてｈる。しかし、こ
れらの方法は、対象となる単語の数が多い場合に祉辞書
の記憶容量が大きくなるため、装置全体が高価となると
ともに、照合のために要する処理時間が増加し、読取速
度の低下をきたすこと、および、読取対象の単語′の種
類忙応じて辞書を作る必要があるため、辞書作成の手続
がわずられしいなどの欠点を持っていた。However, with this method, the psychological burden on the scribe increases, and at the same time, it is necessary to examine minute differences in character shapes during recognition processing, so the recognition logic circuit is large and the equipment is expensive. There was a drawback that it became a thing. In response to this problem, a method has been proposed in which a dictionary of words to be read is prepared in advance, the reading results are compared with the dictionary, and the reading results are corrected to improve reading accuracy. However, with these methods, when the number of target words is large, the storage capacity of the dictionary becomes large, which makes the entire device expensive, increases the processing time required for matching, and reduces the reading speed. This method has drawbacks, such as the fact that the procedure for creating a dictionary is cumbersome because it is necessary to create a dictionary according to the type of words to be read.

本発明は、力、す、英字、数字が混在する文字列ＫＴｈ
ｌｎて、異なった文字が隣接する時には、それらの組合
せは、ある程度限られている場合が多いという点に着目
したもので、予め２文字列が隣接するか否かの情報（以
下、これを前後関係と呼ぶ）を帳票を使って入力するこ
とと１．読取時に２文字列のＳ＊を計数することと比よ
シ、文字読取装置内に前後関係の情報を蓄積させ、その
情報を使りて入力文字に対する候補文字を選択する文字
判定ａｌｌ方式を提供するものであシ、その目的は、従
来方式のようにわずられしい手続を経らずに判定にＭ効
な情報を簡単に装置内に蓄積させ、その情報を使って字
形的に類似した文字を含む異字種混在文字列を高精度に
読取ることにある。The present invention is a character string KTh containing a mixture of letters, letters, and numbers.
This method focuses on the fact that when different characters are adjacent, their combinations are often limited to some extent. (referred to as relationships) using a form; and 1. In contrast to counting the S* of two character strings during reading, we provide a character determination all method that stores context information within the character reading device and uses that information to select candidate characters for input characters. The purpose of this is to easily store information that is useful for making judgments in the device without going through the troublesome procedures required in the conventional method, and to use that information to identify characters that are similar in shape. The objective is to read with high accuracy a character string containing characters of different characters.

第１図は１本発明の実施例でありて、１０は制御部、２
０は走査部、３０線認識部、４０はディスプレイを有す
る操作部、５０は前後関係定義帳票解釈部、６０は前後
関係テーブル記憶部、７０は判定部、８０は文字列頻度
計数部、７１は制御部との結＠、　７２は出力線である
。FIG. 1 shows an embodiment of the present invention, in which 10 is a control section, 2
0 is a scanning unit, 30 line recognition unit, 40 is an operation unit with a display, 50 is a context definition form interpretation unit, 60 is a context table storage unit, 70 is a determination unit, 80 is a character string frequency counting unit, 71 is a Connection with the control unit @ 72 is an output line.

動作においては５、制御部１００指令で走査部２０は帳
票を走査し、光電変換して得られる入力文字の画信号を
認識部３０に転送する。つづいて′ｉ！１＃Ｉｔ部３０
は、制御部１００指令によ）認識処理を行い、入力、文
字に対する候補文字の；−ドを制御部１０に送る。In operation, in step 5, the scanning section 20 scans the form according to a command from the control section 100, and transfers the image signal of the input character obtained by photoelectric conversion to the recognition section 30. Followed by 'i! 1#It section 30
performs recognition processing (according to a command from the control unit 100), and sends ;-, a candidate character for the input character, to the control unit 10.

ここで、入力帳票の帳票種別フィールド内の文字コード
が、該入力帳票が前後関係定義帳票であることを表示し
て−る門らば、制御部１０は、帳票の読取結果を前後関
係定義帳票解釈部５０に送る。Here, if the character code in the form type field of the input form indicates that the input form is a context definition form, the control unit 10 converts the read result of the form into a context definition form. It is sent to the interpretation section 50.

この時必要ならば、制御部１０は、読取結果を操作部４
０のディスプレイに表示させ、オペレータはディスプレ
イ上９読取結果をチェレフして操作部、４０のキーボー
ドを使い、エラー・リジェクト文字′の修正を行うこと
ができる。修正された文字コ７ドは、前後関係定義帳票
解釈部５０に送られる。At this time, if necessary, the control section 10 sends the reading results to the operation section 4.
The operator can check the reading result 9 on the display and correct the error/reject character ' by using the operating section and the keyboard 40. The corrected character code is sent to the context definition form interpretation section 50.

前後関係定義帳票解釈部５０は、前後関係定義帳票の内
容を解釈し、帳票に書かれている２文字列に関する前後
関係の情報を、前後関係テーブル記憶部６０内の前後関
係テーブルに書き込む。The context definition form interpretation unit 50 interprets the contents of the context definition form and writes the information about the context regarding the two character strings written on the form into the context table in the context table storage unit 60.

入力帳票の帳票種別フィールドに書かれた文字コードが
読取対象となる帳票を示していれば、制御部ｌＯは、入
力文字に対して候補となる文字コードを、単位文字列（
例えば、１フイールドの文字列）ごとに判定部７０に送
る。判定部７０は、単位文字列の候補文字コードを走査
し、複数の候補文字が対応する入力文字に対し、該入力
文字の前後に位置する文字の読取結果を基に、前後関係
テーブル記憶部６０に格納されている前後関係テーブル
の内容を参照し、該入力文字の前後の文字と隣接可能な
候補文字を判定する。If the character code written in the form type field of the input form indicates the form to be read, the control unit IO converts the character code that is a candidate for the input character into a unit character string (
For example, each character string of one field is sent to the determination unit 70. The determination unit 70 scans the candidate character code of the unit character string, and based on the reading results of the characters located before and after the input character to which a plurality of candidate characters correspond, the determination unit 70 reads the context table storage unit 60. The candidate characters that can be adjacent to the characters before and after the input character are determined by referring to the contents of the context table stored in the input character.

単位文字列弁の処理が終った時点で、判定部７０は判定
結果を出力線７２に出力する。When the unit character string valve processing is completed, the determination unit 70 outputs the determination result to the output line 72.

文字列！度計数部８０は、判定部７０から送られてくる
単−血文字列の判定結果を基に２文字列の頻度を計数す
る。この時、頻度がある一定値を超え九２文字列に対し
ては、前後関係テーブル記憶部６０に格納されている前
後関係テーブル内の該２文字列に対応°する内容を、隣
接可の情報に書き゛換える。String! The frequency counting unit 80 counts the frequency of two character strings based on the determination result of the single blood character string sent from the determination unit 70. At this time, for 92 character strings whose frequency exceeds a certain value, the content corresponding to the two character strings in the context table stored in the context table storage unit 60 is added to the adjacent possible information. Rewrite it as

本発明の構成は、前後関係定義帳票を入力するとともに
、読取対象帳票の読取時に２文字列の出現頻度を計数す
ることｋよって、文字の前後関係を文字読取装置内の前
後テーブルに蓄積させ、蓄積された前後関係の情報を使
って、複数候補文字の中から適正な候補文字を判定でき
るようｋしたものであ）、字形的忙類似する文字が多く
存在する複数字種の混在文字を高い精度で読取ることが
できる。The configuration of the present invention is to input a context definition form and count the appearance frequency of two character strings when reading a document to be read, thereby accumulating the context relationship of characters in a front and rear table in a character reading device. It uses the accumulated context information to determine the appropriate candidate character from among multiple candidate characters), and the mixed characters of multiple character types that have many characters that are similar in glyph shape are given higher priority. Can be read with precision.

第２図は、前後関係定義帳票の例を示したもので、５１
Ｏは帳票種別フィールド、５２ｏは前文字グループ指定
フィールド、５３０Ｆｉ隣接情報指定フイールド、５４
０は後文字グループ指定フィールドである。Figure 2 shows an example of a context definition form, with 51
O is a form type field, 52o is a previous character group specification field, 530Fi is an adjacent information specification field, 54
0 is the rear character group designation field.

・帳票極刑フィールドは、該帳票が前後関係、定義帳票
であるか否かを示すために設けられたフィールドで、該
フィールドに予め定められた文字あるいは文字列が記入
されれば、紋帳票が前後関係定義帳票であることを意味
する。・The form capital punishment field is a field provided to indicate whether the form is a context or definition form, and if a predetermined character or character string is entered in this field, the form This means that it is a relationship definition form.

前文字グループ指定フィールド５２０　Ｋは、前に位置
する文字のグループが記入され、後文字グループ指定フ
ィールド５４０には、後に位置する文字ツクループが記
入され、２る。隣接情報指定フィールド５３０　Ｋは、
当該性に書かれた前文字グループ指定フィールド５２Ｇ
内の文字と後文字グループ指定フィールド８４０内の文
字とが隣接可が、不可かの情報を示す文字記号（第２図
においては−十鐙が隣接可、′−１が隣接不可を示す記
号として使われでいる）が記入される。The preceding character group designation field 520K is filled with the character group located before, and the subsequent character group designation field 540 is filled with the character group located after. The adjacent information specification field 530K is
Preceding character group specification field 52G written in the relevant gender
A character symbol indicating whether or not the characters inside and the characters in the rear character group designation field 840 can be adjacent to each other (in Fig. 2, -1 stirrup is a symbol indicating that they can be adjacent to each other, and '-1 is a symbol that indicates that they cannot be adjacent to each other. ) is entered.

文字グループを指定するフィールド化は、記述の簡便さ
のため、数字における０、１．２・・・９あるいはカナ
におけるアイウェオ履のように、予め定められ丸文字の
順序−利用し、例えば文“字Ｃ・から文字Ｃｍ　ｔでの
文字グループを指定する時は１（Ｃ□−Ｃｓｙ−なる記
述を許す、ものとする。For ease of description, fielding to specify character groups uses a predetermined order of round characters, such as 0, 1.2...9 in numbers, or aiweoori in kana, and uses, for example, the sentence " When specifying a character group from the letter C. to the letter Cmt, it is assumed that the description 1 (C□-Csy- is allowed).

第２図の記述例は、第２行目で、数字の後にカナが隣接
することはないということを意味し、第３行目で数字の
後にカナの−り一９Ｍチーがくることが許されることを
示す。従うて第２、第３行によって、数字の後には、カ
ナの１り１．１チ１のみが隣接できることが示されてい
る。前後関係定義帳票屏釈部５０は、前後関係定義帳票
上に記述き込む働きをする。The example in Figure 2 means that in the second line, a kana is not adjacent to the number, and in the third line, the kana -riichi 9M chi is allowed after the number. Indicates that Therefore, the second and third lines indicate that only kana 1, 1, 1, and 1 can be adjacent after the number. The context definition form folding unit 50 functions to write descriptions onto the context definition form.

第３図は、前後関係テーブルの例を示したものである。FIG. 3 shows an example of a context table.

前後関係テーブルは、第３図で示すように、カナ、英字
、数字、記号の各文字について、それらの２文１字が隣
接可か不可か不明の情報を、例えば、１，０．ｇの３値
で示したものである。As shown in FIG. 3, the context table contains information about whether or not two characters can be adjacent to each other, such as kana, alphabets, numbers, and symbols, such as 1, 0, etc. It is shown in three values of g.

こζで、αは０　（ｇｉ　（４を満足する数である。例
えば、図において、数字の後には、カナの１り１゜・チ
１は隣彎してよいことを示し、ま丸数字や後には英字が
隣接しないことを、記号の後にはカナが・隣接するかど
うかは不明であることを示し、ている。In this ζ, α is a number that satisfies 0 (gi After a symbol, a kana character indicates that there are no adjacent alphabetic characters, and after a symbol, a kana character indicates that it is unknown whether or not they are adjacent.

前後関係テーブルには、初期値として、同一字種（例え
ば、カナとカナ、あるいは、英字と英字等）の組み合せ
にはｌが、異字種の組み合せｋは−が割シ当てられてい
るものとする。この初期値の設定は文字列において、互
いに隣シ合５２文字が同−字種である確率が異字種であ
る確率に比べ高いことを反映したものである。In the context table, as an initial value, l is assigned to a combination of the same character type (for example, kana and kana, or alphabetic character and alphabetic character, etc.), and - is assigned to a combination of different character types k. do. This initial value setting reflects the fact that in a character string, the probability that 52 adjacent characters are of the same character type is higher than the probability that they are of different character types.

第４図は、判定部の構成を示したものである。FIG. 4 shows the configuration of the determination section.

７１０は候補文字レジスタ、７２０はリジェクト文字検
出回路、７３０は前後関係テーブルアドレス生成回路、
７４０は加算回路、７５０は最大値検出回路であ）、７
１は制御部との結線、７２は出力線である。710 is a candidate character register, 720 is a reject character detection circuit, 730 is a context table address generation circuit,
740 is an addition circuit, 750 is a maximum value detection circuit), 7
1 is a connection with the control section, and 72 is an output line.

判定部の動作の目的は、複数の候補文字が対応する人、
力文字に対し、候補文字の中から蚊入力文字の前後の文
字と隣接する可能性の最も高い文字を選択することによ
）、リジェクトの修正を行うことである。入力線７１か
ら送られてくる単位文字列の候補文字のコードは、候補
文字レジスタに逐次格納される。続いて、リジェクト文
字検出回路７２０は、レジスタ内を走査し、複数（−個
の候補文字Ｃ％、・・・、Ｃ；が対地するためにリジェ
クトされる、＄おそ・れＯあるリジェクト文字Ｃ４の位置を検出する。The purpose of the operation of the judgment section is to identify the person to whom multiple candidate characters correspond,
The purpose of this method is to correct rejects (for input characters, by selecting from candidate characters the character that is most likely to be adjacent to the characters before and after the mosquito input character). Codes of candidate characters of the unit character string sent from the input line 71 are sequentially stored in the candidate character register. Next, the reject character detection circuit 720 scans the register and detects a certain reject character C4 that is rejected because a plurality of (-) candidate characters C%, . . . , C; Detect the position of.

１、次に、該文字位置の前後に位置する文字の読取結果
がリジェクトでなければ、その候補文字Ｃ（−Ｉｓｃ＝
＋１１−候補レジスタから読み出し、２ｓ組の２文字列
Ｃ（−ｓ　Ｃ３ｓ　Ｃ’４　Ｃ４＋ｘ　、　Ｃ（−ｓ　
Ｃ”４　ｓ　Ｃ’４０４＋ｓ慶・・・、　Ｃ（−ｍ　ｃ
フ　ＣＩ　Ｃ（＋＊を、順次前後関係テーブルアドレス
生成回路７３０　ｊｃ送る。1. Next, if the reading results of the characters located before and after the character position are not rejected, the candidate character C (-Isc=
+11- Read from the candidate register, 2s set of 2 character strings C(-s C3s C'4 C4+x, C(-s
C"4 s C'404+skei..., C(-m c
FCI C(+* is sequentially sent to the context table address generation circuit 730 jc.

前後関係テーブルアドレス生成回路７３Ｇは、２文字列
ＣシーＩＣ）、Ｃ）Ｃイ＋１．・・・＃ＣフＣ１＋１に
関する前後関係情報が格納されている前後関係テープ゛
ルのアドレスを順次生成し、アドレス情報を前後関係テ
ーブル゛記憶部との結線７３１１ｃ送る。結線７４１を
通して前後関係テーブル記憶部から送られてくる２文字
列Ｃ６−１Ｃ”　、　Ｃ”　Ｃ４＋ｔ　、・・・、Ｃフ
Ｃ４＋ｕに　　　　１関する前後関係情報Ｒ（Ｃ４−１弓）＊　Ｒ（Ｃ３Ｃ（
＋ｍ）　ｅ・・・。The context table address generation circuit 73G generates two character strings C C i IC), C) C i+1 . . . . Sequentially generates the addresses of the context tape in which context information regarding #C file C1+1 is stored, and sends the address information to the connection 7311c with the context table storage unit. Context information R(C4-1 bow)*R(C3C(
+m) e...

Ｒ（Ｃ弓Ｃ４＋１）　（各値はＯ＃　ａ　＃　１のりず
れかで、・一番は前述した０くａく÷を満足する数）は加算回路７４０
　Ｋ入力される。R (C bow C4 + 1) (Each value is either O# a # 1, and the number that satisfies the above-mentioned 0 × a × ÷) is the addition circuit 740
K is input.

一゛　加算回路７４０は各候補文字Ｃ：　（＆ｍｌ、・
・・悔）ごとにＲ（Ｃ４−ｍ　ｃｉ）＋　Ｒ（借Ｃ（＋
ｘ）を計算し、計算結果を最大値検出回路７５０へ送る
。最大値検出回路７５０はＲ（Ｃ４−１ｃｉ）＋　Ｒ（
ＯｉＣ４＋ｔ）が最大となるＣＩ、ｏを選択し、候補文
字レジスタ内の対応ずゐ文字位置の場所にＣ４°を書き
込む。したがりて、以前に書か＄れていたＯＳ・・・ｃｌは、ｅｐｉｃ書き換えられる。1゛ Addition circuit 740 each candidate character C: (&ml, ・
... regret), R(C4-m ci) + R(debt C(+
x) and sends the calculation result to the maximum value detection circuit 750. The maximum value detection circuit 750 has R(C4-1ci)+R(
Select CI, o, for which OiC4+t) is the maximum, and write C4° at the corresponding character position in the candidate character register. Therefore, the previously written OS...cl is rewritten as epic.

候補文字レジスタ内にある複数候補を持つすべての文字
に対して上記の処理が終り九時点で、最終的に得られ次
候補文字のコードを出力線７２に出力する。この時、最
後まで一補文字がしぼれなかった文字Ｃ４はりジエクト
クードとして出力さ昨る・第６図は、文字列頻度計数回
路の構成を示し九ものである。−８１０はシフトレジス
タ、８２０は作業。When the above processing is completed for all characters in the candidate character register that have multiple candidates, the code of the next candidate character finally obtained is outputted to the output line 72. At this time, the character C4 whose one-complementary characters have not been squeezed out until the end is output as a diectokud. Figure 6 shows the configuration of the character string frequency counting circuit. -810 is a shift register, 820 is a work.

用メモ、リアドレス生成回路、８３０は作業用メモリ、
８４０は加算回路、８５０は比較回路であル、７２は判
定部７０との結線、８１は前後関係テーブル記憶部６０
との結線である。memory, rear address generation circuit, 830 is working memory,
840 is an addition circuit, 850 is a comparison circuit, 72 is a connection with the determination unit 70, and 81 is a context table storage unit 60.
This is the connection.

この文字列頻度計数回路の動作目的は、２文学列の出現
Ｓ＊を計数し、ある一定値を超えゐ２文字列については
、骸２文字が互い忙隣接可能であると判定し、前後関係
テーブルの対応する場所の内容を隣接可（即ち１）Ｋ−
き換えることである。The purpose of operation of this character string frequency counting circuit is to count the occurrences S* of two literary strings, and for two character strings that exceed a certain value, determine that two empty characters can be adjacent to each other, and determine the context. The content of the corresponding location of the table can be adjacent (i.e. 1) K-
It's about changing.

−結線７２から送られてくる文字コードは逐゛次シフト
レジスタ８１０に格納されゐ、シフトレジスタ８１０は
、常に２文字分の；−ドを保持してお）、逐次′２文字
分のコードを作業用メモリアドレス生成回路８２０　Ｋ
送る０作業用メモリ８３０は、各２文字列の出現頻度を
記憶しているメ毫りであシ作業用メモリアドレス生成回
路８２０は、送られてき九２文字分のコードがいずれも
リジェクトコードでなければ、作業用メモリ８３０の該
２文字列に対、応するアドレスを生成し、作業用メモリ
８３０にアドレス情報を送る。読み出された作業用メモ
リ８８０の内容は、加算回路８４０　Ｋ送られ、加算回
路８４０はその内容に１を加算し、加算結果を作業用メ
モリの元のアドレスに書き込むとともに、比較回路８５
０にも送る。The character code sent from the - connection 72 is sequentially stored in the shift register 810, and the shift register 810 always holds the ;- code for two characters. Working memory address generation circuit 820K
The sending 0 working memory 830 is a memory that stores the appearance frequency of each two character string, and the working memory address generation circuit 820 recognizes that all of the 92 characters of codes that are sent are reject codes. If not, a corresponding address is generated for the two character strings in the working memory 830, and address information is sent to the working memory 830. The read contents of the working memory 880 are sent to the adder circuit 840K, which adds 1 to the contents, writes the addition result to the original address of the working memory, and sends it to the comparator circuit 85K.
Also send to 0.

比較回路８５０は、加算結果が予め定められたい色値を
超えているか否かを判定し、もし超えていれば、当該２
文字列に対応する前後関係テーブルのアドレスに１を書
き込む。The comparison circuit 850 determines whether the addition result exceeds a predetermined color value, and if it exceeds the desired color value, the comparison circuit 850
Writes 1 to the address of the context table that corresponds to the character string.

以上説明したように、本発明によれば、読取対象帳票に
おける文字の前後関係を前後関係定義帳票を入力すると
ともに、読取対象帳票の読取時に２文字列の出現頻度を
計数することによって、それらのデータを装゛普内゛め
前後関係テーブル内に蓄積させ、この情報を使って複数
の候補文字の中力為ら適正な候補文字を判定することが
できる。また、上記文字判定処理方式によシ、生形的に
類似する文字が数多く存在するカナ、英字、数字等の異
字種混在文字の読取〕において、認識論理の規模を増大
させることなく高精度の読取シを実現することができる
。As explained above, according to the present invention, the context of characters in a document to be read is inputted into a context definition document, and the frequency of appearance of two character strings is counted when reading the document to be read, so that the character strings can be recognized. Data can be stored in an embedded context table and this information can be used to determine the correct candidate character from a pool of candidate characters. In addition, the above-mentioned character judgment processing method can achieve high accuracy without increasing the scale of recognition logic in reading characters with mixed character types such as kana, alphabets, and numbers, which have many morphologically similar characters. It is possible to realize the reading function.

[Brief explanation of the drawing]

第１図は本発明の１実施例のブーツク図、第２図は前後
関係定義帳票の１例を示す構成図；第一３図は前後関係
テーブルの１例を示す構成図、第４図は判定部の１実施
例のブロック図、第５図は文字列頻度計数部の１実施例
のプ薗ツク図である。図中、１０は制御部、２０は走査部、３Ｇは認識部、４
０は操作部、ｓＯは前後関係定義帳票解釈部、６Ｇは前
後関係テーブル記憶部、゛７０社判定部、′７１は制御
部と他部との間の結線、７２は出力線、８０は文字列頻
度計数部を示す。特許出願人　　日本電信電話公社代理人弁理士　　森　１）　寛才２目才３図才４Ａｔ　５　図Figure 1 is a boot diagram of one embodiment of the present invention, Figure 2 is a configuration diagram showing an example of a context definition form; Figure 13 is a configuration diagram showing an example of a context table, and Figure 4 is a configuration diagram showing an example of a context definition form. FIG. 5 is a block diagram of one embodiment of the determination section. FIG. 5 is a block diagram of one embodiment of the character string frequency counting section. In the figure, 10 is a control unit, 20 is a scanning unit, 3G is a recognition unit, 4
0 is the operation unit, sO is the context definition form interpretation unit, 6G is the context table storage unit, 70 company determination unit, ’71 is the connection between the control unit and other parts, 72 is the output line, and 80 is the character The column frequency counter is shown. Patent applicant Nippon Telegraph and Telephone Public Corporation Patent attorney Mori 1) Kansai 2 eyes 3 figures 4 A t 5 figures

Claims

[Claims]

(1) In a character reading method that optically reads forms written in a mixture of characters of multiple character types, a recognition means capable of outputting multiple candidate characters of =- for one input character. , reads the context definition form that describes the arrangement of two character strings in the document to be read, and creates a context table that indicates whether or not pairs of two consecutive characters are allowed to be adjacent to each other. and means for selecting a character that can be adjacent to a previous or subsequent character from among candidate characters for an input character based on information in the context table. (2. In the character determination processing method recited in claim 4, item 1, the means for creating a context table reads the characters on the input form and at the same time, from the reading result, the two characters appearing on the form - 1 A character determination processing method comprising means for counting the appearance frequency of the two character strings and rewriting the contents of a context table (according to the appearance #JR) of the two character strings.