[go: up one dir, main page]

JPS5848180A - Character decision processing system - Google Patents

Character decision processing system

Info

Publication number
JPS5848180A
JPS5848180A JP56145897A JP14589781A JPS5848180A JP S5848180 A JPS5848180 A JP S5848180A JP 56145897 A JP56145897 A JP 56145897A JP 14589781 A JP14589781 A JP 14589781A JP S5848180 A JPS5848180 A JP S5848180A
Authority
JP
Japan
Prior art keywords
character
characters
input
context
character strings
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP56145897A
Other languages
Japanese (ja)
Inventor
Yukiyasu Iida
飯田 行恭
Shunkichi Tada
多田 俊吉
Yasuhiro Yamada
山田 康宏
Kazuaki Komori
小森 和昭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP56145897A priority Critical patent/JPS5848180A/en
Publication of JPS5848180A publication Critical patent/JPS5848180A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

PURPOSE:To read a character string, which contains different kind of characters whose forms are similar, with high precision by previously inputting information on whether two character strings are adjacent to each other or not through a form, and then counting the frequencies of those two character strings during reading operation. CONSTITUTION:Information on whether two character strings are adjacent to each other or not is inputted previously through a form, and during reading operation, the frequencies of those two character strings are counted to select a possible character for an input character. For example, the form is scanned through a scanning part 20 under the command of a control part 10, and the picture signal of the input character obtained by photoelectric conversion is transferred to a recognition part 30 to perform recognition processing, thereby sending the code of the candidate character for the input character to the control part 10. Then when the input form has underfined front and back relation, the readout result is written in a front and back relation table storage part 60 through an interpretation part 50 for the form having underfined front and back relation. When the input document is a form to be read, the candidate character is sent to a decision part 70, whose decision result is sent out through an output line 72; and a character string frequency counting part 80 counts the frequencies of those two character strings.

Description

【発明の詳細な説明】 本発明は、文字判定処理方式、特にカナ、英字、数字等
の複数の字種が搗在して書かれてぃ′る帳票を読取る文
字読取装置における文字判定処理方式に関す石ものであ
る。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character determination processing method, particularly a character determination processing method in a character reading device that reads a form in which a plurality of character types such as kana, alphabetic characters, and numbers are written. It is a stone related to.

カナ、英字、数字等の異なり九字種の文字を混在させた
場合、互いkよく似た文字の組、例えば、カナの1工1
と英字の111、数字の171とカナの1り1等が多く
存在する。異字種が混iする文字列を読取る場合、従来
の文字読取装置では、これらの類似文字間の差を強調し
て書かせ、例えば、数字の07″の場合は右の線分が左
側に凸に湾曲すゐように、またカナの・り6の場合は右
側に凸に湾曲するように書かせる等によシ、読取精度を
確保していた。
If you mix different types of characters such as kana, alphabets, numbers, etc., a set of characters that are very similar to each other, for example, kana 1 k 1
There are many alphabetic characters, 111, numbers 171, and kana characters such as 1 and 1. When reading a character string containing a mixture of different characters, conventional character reading devices emphasize the differences between these similar characters. For example, in the case of the number 07'', the right line segment is convex to the left. The reading accuracy was ensured by writing the letters in a curved manner, and in the case of kana no ri 6, making them curve convexly to the right.

しかし、この方法では、筆記者への心理的負担が増える
と同時に、認識処理において、微細な字形の差を調べな
ければならなhため、認識論理回路の規模が大きくなシ
、゛装置が高価なものになるといった欠点があった。こ
れ忙対し、読取対象単語の辞書を予め用意しておき、読
取結果と辞書とを比較し、読取結果を修正することで読
取精度を上げる方法などが提案されてhる。しかし、こ
れらの方法は、対象となる単語の数が多い場合に祉辞書
の記憶容量が大きくなるため、装置全体が高価となると
ともに、照合のために要する処理時間が増加し、読取速
度の低下をきたすこと、および、読取対象の単語′の種
類忙応じて辞書を作る必要があるため、辞書作成の手続
がわずられしいなどの欠点を持っていた。
However, with this method, the psychological burden on the scribe increases, and at the same time, it is necessary to examine minute differences in character shapes during recognition processing, so the recognition logic circuit is large and the equipment is expensive. There was a drawback that it became a thing. In response to this problem, a method has been proposed in which a dictionary of words to be read is prepared in advance, the reading results are compared with the dictionary, and the reading results are corrected to improve reading accuracy. However, with these methods, when the number of target words is large, the storage capacity of the dictionary becomes large, which makes the entire device expensive, increases the processing time required for matching, and reduces the reading speed. This method has drawbacks, such as the fact that the procedure for creating a dictionary is cumbersome because it is necessary to create a dictionary according to the type of words to be read.

本発明は、力、す、英字、数字が混在する文字列KTh
lnて、異なった文字が隣接する時には、それらの組合
せは、ある程度限られている場合が多いという点に着目
したもので、予め2文字列が隣接するか否かの情報(以
下、これを前後関係と呼ぶ)を帳票を使って入力するこ
とと1.読取時に2文字列のS*を計数することと比よ
シ、文字読取装置内に前後関係の情報を蓄積させ、その
情報を使りて入力文字に対する候補文字を選択する文字
判定all方式を提供するものであシ、その目的は、従
来方式のようにわずられしい手続を経らずに判定にM効
な情報を簡単に装置内に蓄積させ、その情報を使って字
形的に類似した文字を含む異字種混在文字列を高精度に
読取ることにある。
The present invention is a character string KTh containing a mixture of letters, letters, and numbers.
This method focuses on the fact that when different characters are adjacent, their combinations are often limited to some extent. (referred to as relationships) using a form; and 1. In contrast to counting the S* of two character strings during reading, we provide a character determination all method that stores context information within the character reading device and uses that information to select candidate characters for input characters. The purpose of this is to easily store information that is useful for making judgments in the device without going through the troublesome procedures required in the conventional method, and to use that information to identify characters that are similar in shape. The objective is to read with high accuracy a character string containing characters of different characters.

第1図は1本発明の実施例でありて、10は制御部、2
0は走査部、30線認識部、40はディスプレイを有す
る操作部、50は前後関係定義帳票解釈部、60は前後
関係テーブル記憶部、70は判定部、80は文字列頻度
計数部、71は制御部との結@、 72は出力線である
FIG. 1 shows an embodiment of the present invention, in which 10 is a control section, 2
0 is a scanning unit, 30 line recognition unit, 40 is an operation unit with a display, 50 is a context definition form interpretation unit, 60 is a context table storage unit, 70 is a determination unit, 80 is a character string frequency counting unit, 71 is a Connection with the control unit @ 72 is an output line.

動作においては5、制御部100指令で走査部20は帳
票を走査し、光電変換して得られる入力文字の画信号を
認識部30に転送する。つづいて′i!1#It部30
は、制御部100指令によ)認識処理を行い、入力、文
字に対する候補文字の;−ドを制御部10に送る。
In operation, in step 5, the scanning section 20 scans the form according to a command from the control section 100, and transfers the image signal of the input character obtained by photoelectric conversion to the recognition section 30. Followed by 'i! 1#It section 30
performs recognition processing (according to a command from the control unit 100), and sends ;-, a candidate character for the input character, to the control unit 10.

ここで、入力帳票の帳票種別フィールド内の文字コード
が、該入力帳票が前後関係定義帳票であることを表示し
て−る門らば、制御部10は、帳票の読取結果を前後関
係定義帳票解釈部50に送る。
Here, if the character code in the form type field of the input form indicates that the input form is a context definition form, the control unit 10 converts the read result of the form into a context definition form. It is sent to the interpretation section 50.

この時必要ならば、制御部10は、読取結果を操作部4
0のディスプレイに表示させ、オペレータはディスプレ
イ上9読取結果をチェレフして操作部、40のキーボー
ドを使い、エラー・リジェクト文字′の修正を行うこと
ができる。修正された文字コ7ドは、前後関係定義帳票
解釈部50に送られる。
At this time, if necessary, the control section 10 sends the reading results to the operation section 4.
The operator can check the reading result 9 on the display and correct the error/reject character ' by using the operating section and the keyboard 40. The corrected character code is sent to the context definition form interpretation section 50.

前後関係定義帳票解釈部50は、前後関係定義帳票の内
容を解釈し、帳票に書かれている2文字列に関する前後
関係の情報を、前後関係テーブル記憶部60内の前後関
係テーブルに書き込む。
The context definition form interpretation unit 50 interprets the contents of the context definition form and writes the information about the context regarding the two character strings written on the form into the context table in the context table storage unit 60.

入力帳票の帳票種別フィールドに書かれた文字コードが
読取対象となる帳票を示していれば、制御部lOは、入
力文字に対して候補となる文字コードを、単位文字列(
例えば、1フイールドの文字列)ごとに判定部70に送
る。判定部70は、単位文字列の候補文字コードを走査
し、複数の候補文字が対応する入力文字に対し、該入力
文字の前後に位置する文字の読取結果を基に、前後関係
テーブル記憶部60に格納されている前後関係テーブル
の内容を参照し、該入力文字の前後の文字と隣接可能な
候補文字を判定する。
If the character code written in the form type field of the input form indicates the form to be read, the control unit IO converts the character code that is a candidate for the input character into a unit character string (
For example, each character string of one field is sent to the determination unit 70. The determination unit 70 scans the candidate character code of the unit character string, and based on the reading results of the characters located before and after the input character to which a plurality of candidate characters correspond, the determination unit 70 reads the context table storage unit 60. The candidate characters that can be adjacent to the characters before and after the input character are determined by referring to the contents of the context table stored in the input character.

単位文字列弁の処理が終った時点で、判定部70は判定
結果を出力線72に出力する。
When the unit character string valve processing is completed, the determination unit 70 outputs the determination result to the output line 72.

文字列!度計数部80は、判定部70から送られてくる
単−血文字列の判定結果を基に2文字列の頻度を計数す
る。この時、頻度がある一定値を超え九2文字列に対し
ては、前後関係テーブル記憶部60に格納されている前
後関係テーブル内の該2文字列に対応°する内容を、隣
接可の情報に書き゛換える。
String! The frequency counting unit 80 counts the frequency of two character strings based on the determination result of the single blood character string sent from the determination unit 70. At this time, for 92 character strings whose frequency exceeds a certain value, the content corresponding to the two character strings in the context table stored in the context table storage unit 60 is added to the adjacent possible information. Rewrite it as

本発明の構成は、前後関係定義帳票を入力するとともに
、読取対象帳票の読取時に2文字列の出現頻度を計数す
ることkよって、文字の前後関係を文字読取装置内の前
後テーブルに蓄積させ、蓄積された前後関係の情報を使
って、複数候補文字の中から適正な候補文字を判定でき
るようkしたものであ)、字形的忙類似する文字が多く
存在する複数字種の混在文字を高い精度で読取ることが
できる。
The configuration of the present invention is to input a context definition form and count the appearance frequency of two character strings when reading a document to be read, thereby accumulating the context relationship of characters in a front and rear table in a character reading device. It uses the accumulated context information to determine the appropriate candidate character from among multiple candidate characters), and the mixed characters of multiple character types that have many characters that are similar in glyph shape are given higher priority. Can be read with precision.

第2図は、前後関係定義帳票の例を示したもので、51
Oは帳票種別フィールド、52oは前文字グループ指定
フィールド、530Fi隣接情報指定フイールド、54
0は後文字グループ指定フィールドである。
Figure 2 shows an example of a context definition form, with 51
O is a form type field, 52o is a previous character group specification field, 530Fi is an adjacent information specification field, 54
0 is the rear character group designation field.

・帳票極刑フィールドは、該帳票が前後関係、定義帳票
であるか否かを示すために設けられたフィールドで、該
フィールドに予め定められた文字あるいは文字列が記入
されれば、紋帳票が前後関係定義帳票であることを意味
する。
・The form capital punishment field is a field provided to indicate whether the form is a context or definition form, and if a predetermined character or character string is entered in this field, the form This means that it is a relationship definition form.

前文字グループ指定フィールド520 Kは、前に位置
する文字のグループが記入され、後文字グループ指定フ
ィールド540には、後に位置する文字ツクループが記
入され、2る。隣接情報指定フィールド530 Kは、
当該性に書かれた前文字グループ指定フィールド52G
内の文字と後文字グループ指定フィールド840内の文
字とが隣接可が、不可かの情報を示す文字記号(第2図
においては−十鐙が隣接可、′−1が隣接不可を示す記
号として使われでいる)が記入される。
The preceding character group designation field 520K is filled with the character group located before, and the subsequent character group designation field 540 is filled with the character group located after. The adjacent information specification field 530K is
Preceding character group specification field 52G written in the relevant gender
A character symbol indicating whether or not the characters inside and the characters in the rear character group designation field 840 can be adjacent to each other (in Fig. 2, -1 stirrup is a symbol indicating that they can be adjacent to each other, and '-1 is a symbol that indicates that they cannot be adjacent to each other. ) is entered.

文字グループを指定するフィールド化は、記述の簡便さ
のため、数字における0、1.2・・・9あるいはカナ
におけるアイウェオ履のように、予め定められ丸文字の
順序−利用し、例えば文“字C・から文字Cm tでの
文字グループを指定する時は1(C□−Csy−なる記
述を許す、ものとする。
For ease of description, fielding to specify character groups uses a predetermined order of round characters, such as 0, 1.2...9 in numbers, or aiweoori in kana, and uses, for example, the sentence " When specifying a character group from the letter C. to the letter Cmt, it is assumed that the description 1 (C□-Csy- is allowed).

第2図の記述例は、第2行目で、数字の後にカナが隣接
することはないということを意味し、第3行目で数字の
後にカナの−り一9Mチーがくることが許されることを
示す。従うて第2、第3行によって、数字の後には、カ
ナの1り1.1チ1のみが隣接できることが示されてい
る。前後関係定義帳票屏釈部50は、前後関係定義帳票
上に記述き込む働きをする。
The example in Figure 2 means that in the second line, a kana is not adjacent to the number, and in the third line, the kana -riichi 9M chi is allowed after the number. Indicates that Therefore, the second and third lines indicate that only kana 1, 1, 1, and 1 can be adjacent after the number. The context definition form folding unit 50 functions to write descriptions onto the context definition form.

第3図は、前後関係テーブルの例を示したものである。FIG. 3 shows an example of a context table.

前後関係テーブルは、第3図で示すように、カナ、英字
、数字、記号の各文字について、それらの2文1字が隣
接可か不可か不明の情報を、例えば、1,0.gの3値
で示したものである。
As shown in FIG. 3, the context table contains information about whether or not two characters can be adjacent to each other, such as kana, alphabets, numbers, and symbols, such as 1, 0, etc. It is shown in three values of g.

こζで、αは0 (gi (4を満足する数である。例
えば、図において、数字の後には、カナの1り1゜・チ
1は隣彎してよいことを示し、ま丸数字や後には英字が
隣接しないことを、記号の後にはカナが・隣接するかど
うかは不明であることを示し、ている。
In this ζ, α is a number that satisfies 0 (gi After a symbol, a kana character indicates that there are no adjacent alphabetic characters, and after a symbol, a kana character indicates that it is unknown whether or not they are adjacent.

前後関係テーブルには、初期値として、同一字種(例え
ば、カナとカナ、あるいは、英字と英字等)の組み合せ
にはlが、異字種の組み合せkは−が割シ当てられてい
るものとする。この初期値の設定は文字列において、互
いに隣シ合52文字が同−字種である確率が異字種であ
る確率に比べ高いことを反映したものである。
In the context table, as an initial value, l is assigned to a combination of the same character type (for example, kana and kana, or alphabetic character and alphabetic character, etc.), and - is assigned to a combination of different character types k. do. This initial value setting reflects the fact that in a character string, the probability that 52 adjacent characters are of the same character type is higher than the probability that they are of different character types.

第4図は、判定部の構成を示したものである。FIG. 4 shows the configuration of the determination section.

710は候補文字レジスタ、720はリジェクト文字検
出回路、730は前後関係テーブルアドレス生成回路、
740は加算回路、750は最大値検出回路であ)、7
1は制御部との結線、72は出力線である。
710 is a candidate character register, 720 is a reject character detection circuit, 730 is a context table address generation circuit,
740 is an addition circuit, 750 is a maximum value detection circuit), 7
1 is a connection with the control section, and 72 is an output line.

判定部の動作の目的は、複数の候補文字が対応する人、
力文字に対し、候補文字の中から蚊入力文字の前後の文
字と隣接する可能性の最も高い文字を選択することによ
)、リジェクトの修正を行うことである。入力線71か
ら送られてくる単位文字列の候補文字のコードは、候補
文字レジスタに逐次格納される。続いて、リジェクト文
字検出回路720は、レジスタ内を走査し、複数(−個
の候補文字C%、・・・、C;が対地するためにリジェ
クトされる、$ おそ・れOあるリジェクト文字C4の位置を検出する。
The purpose of the operation of the judgment section is to identify the person to whom multiple candidate characters correspond,
The purpose of this method is to correct rejects (for input characters, by selecting from candidate characters the character that is most likely to be adjacent to the characters before and after the mosquito input character). Codes of candidate characters of the unit character string sent from the input line 71 are sequentially stored in the candidate character register. Next, the reject character detection circuit 720 scans the register and detects a certain reject character C4 that is rejected because a plurality of (-) candidate characters C%, . . . , C; Detect the position of.

1、次に、該文字位置の前後に位置する文字の読取結果
がリジェクトでなければ、その候補文字C(−Isc=
+11−候補レジスタから読み出し、2s組の2文字列
C(−s C3s C’4 C4+x 、 C(−s 
C”4 s C’404+s慶・・・、 C(−m c
フ CI C(+*を、順次前後関係テーブルアドレス
生成回路730 jc送る。
1. Next, if the reading results of the characters located before and after the character position are not rejected, the candidate character C (-Isc=
+11- Read from the candidate register, 2s set of 2 character strings C(-s C3s C'4 C4+x, C(-s
C"4 s C'404+skei..., C(-m c
FCI C(+* is sequentially sent to the context table address generation circuit 730 jc.

前後関係テーブルアドレス生成回路73Gは、2文字列
CシーIC)、C)Cイ+1.・・・#CフC1+1に
関する前後関係情報が格納されている前後関係テープ゛
ルのアドレスを順次生成し、アドレス情報を前後関係テ
ーブル゛記憶部との結線7311c送る。結線741を
通して前後関係テーブル記憶部から送られてくる2文字
列C6−1C” 、 C” C4+t 、・・・、Cフ
C4+uに    1 関する前後関係情報R(C4−1弓)* R(C3C(
+m) e・・・。
The context table address generation circuit 73G generates two character strings C C i IC), C) C i+1 . . . . Sequentially generates the addresses of the context tape in which context information regarding #C file C1+1 is stored, and sends the address information to the connection 7311c with the context table storage unit. Context information R(C4-1 bow)*R(C3C(
+m) e...

R(C弓C4+1) (各値はO# a # 1のりず
れかで、・一番 は前述した0くaく÷を満足する数)は加算回路740
 K入力される。
R (C bow C4 + 1) (Each value is either O# a # 1, and the number that satisfies the above-mentioned 0 × a × ÷) is the addition circuit 740
K is input.

一゛ 加算回路740は各候補文字C: (&ml、・
・・悔)ごとにR(C4−m ci)+ R(借C(+
x)を計算し、計算結果を最大値検出回路750へ送る
。最大値検出回路750はR(C4−1ci)+ R(
OiC4+t)が最大となるCI、oを選択し、候補文
字レジスタ内の対応ずゐ文字位置の場所にC4°を書き
込む。したがりて、以前に書か$ れていたOS・・・clは、epic書き換えられる。
1゛ Addition circuit 740 each candidate character C: (&ml, ・
... regret), R(C4-m ci) + R(debt C(+
x) and sends the calculation result to the maximum value detection circuit 750. The maximum value detection circuit 750 has R(C4-1ci)+R(
Select CI, o, for which OiC4+t) is the maximum, and write C4° at the corresponding character position in the candidate character register. Therefore, the previously written OS...cl is rewritten as epic.

候補文字レジスタ内にある複数候補を持つすべての文字
に対して上記の処理が終り九時点で、最終的に得られ次
候補文字のコードを出力線72に出力する。この時、最
後まで一補文字がしぼれなかった文字C4はりジエクト
クードとして出力さ昨る・第6図は、文字列頻度計数回
路の構成を示し九ものである。−810はシフトレジス
タ、820は作業。
When the above processing is completed for all characters in the candidate character register that have multiple candidates, the code of the next candidate character finally obtained is outputted to the output line 72. At this time, the character C4 whose one-complementary characters have not been squeezed out until the end is output as a diectokud. Figure 6 shows the configuration of the character string frequency counting circuit. -810 is a shift register, 820 is a work.

用メモ、リアドレス生成回路、830は作業用メモリ、
840は加算回路、850は比較回路であル、72は判
定部70との結線、81は前後関係テーブル記憶部60
との結線である。
memory, rear address generation circuit, 830 is working memory,
840 is an addition circuit, 850 is a comparison circuit, 72 is a connection with the determination unit 70, and 81 is a context table storage unit 60.
This is the connection.

この文字列頻度計数回路の動作目的は、2文学列の出現
S*を計数し、ある一定値を超えゐ2文字列については
、骸2文字が互い忙隣接可能であると判定し、前後関係
テーブルの対応する場所の内容を隣接可(即ち1)K−
き換えることである。
The purpose of operation of this character string frequency counting circuit is to count the occurrences S* of two literary strings, and for two character strings that exceed a certain value, determine that two empty characters can be adjacent to each other, and determine the context. The content of the corresponding location of the table can be adjacent (i.e. 1) K-
It's about changing.

−結線72から送られてくる文字コードは逐゛次シフト
レジスタ810に格納されゐ、シフトレジスタ810は
、常に2文字分の;−ドを保持してお)、逐次′2文字
分のコードを作業用メモリアドレス生成回路820 K
送る0作業用メモリ830は、各2文字列の出現頻度を
記憶しているメ毫りであシ作業用メモリアドレス生成回
路820は、送られてき九2文字分のコードがいずれも
リジェクトコードでなければ、作業用メモリ830の該
2文字列に対、応するアドレスを生成し、作業用メモリ
830にアドレス情報を送る。読み出された作業用メモ
リ880の内容は、加算回路840 K送られ、加算回
路840はその内容に1を加算し、加算結果を作業用メ
モリの元のアドレスに書き込むとともに、比較回路85
0にも送る。
The character code sent from the - connection 72 is sequentially stored in the shift register 810, and the shift register 810 always holds the ;- code for two characters. Working memory address generation circuit 820K
The sending 0 working memory 830 is a memory that stores the appearance frequency of each two character string, and the working memory address generation circuit 820 recognizes that all of the 92 characters of codes that are sent are reject codes. If not, a corresponding address is generated for the two character strings in the working memory 830, and address information is sent to the working memory 830. The read contents of the working memory 880 are sent to the adder circuit 840K, which adds 1 to the contents, writes the addition result to the original address of the working memory, and sends it to the comparator circuit 85K.
Also send to 0.

比較回路850は、加算結果が予め定められたい色値を
超えているか否かを判定し、もし超えていれば、当該2
文字列に対応する前後関係テーブルのアドレスに1を書
き込む。
The comparison circuit 850 determines whether the addition result exceeds a predetermined color value, and if it exceeds the desired color value, the comparison circuit 850
Writes 1 to the address of the context table that corresponds to the character string.

以上説明したように、本発明によれば、読取対象帳票に
おける文字の前後関係を前後関係定義帳票を入力すると
ともに、読取対象帳票の読取時に2文字列の出現頻度を
計数することによって、それらのデータを装゛普内゛め
前後関係テーブル内に蓄積させ、この情報を使って複数
の候補文字の中力為ら適正な候補文字を判定することが
できる。また、上記文字判定処理方式によシ、生形的に
類似する文字が数多く存在するカナ、英字、数字等の異
字種混在文字の読取〕において、認識論理の規模を増大
させることなく高精度の読取シを実現することができる
As explained above, according to the present invention, the context of characters in a document to be read is inputted into a context definition document, and the frequency of appearance of two character strings is counted when reading the document to be read, so that the character strings can be recognized. Data can be stored in an embedded context table and this information can be used to determine the correct candidate character from a pool of candidate characters. In addition, the above-mentioned character judgment processing method can achieve high accuracy without increasing the scale of recognition logic in reading characters with mixed character types such as kana, alphabets, and numbers, which have many morphologically similar characters. It is possible to realize the reading function.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の1実施例のブーツク図、第2図は前後
関係定義帳票の1例を示す構成図;第一3図は前後関係
テーブルの1例を示す構成図、第4図は判定部の1実施
例のブロック図、第5図は文字列頻度計数部の1実施例
のプ薗ツク図である。 図中、10は制御部、20は走査部、3Gは認識部、4
0は操作部、sOは前後関係定義帳票解釈部、6Gは前
後関係テーブル記憶部、゛70社判定部、′71は制御
部と他部との間の結線、72は出力線、80は文字列頻
度計数部を示す。 特許出願人  日本電信電話公社 代理人弁理士  森 1) 寛 才2目 才3図 才4A t 5 図
Figure 1 is a boot diagram of one embodiment of the present invention, Figure 2 is a configuration diagram showing an example of a context definition form; Figure 13 is a configuration diagram showing an example of a context table, and Figure 4 is a configuration diagram showing an example of a context definition form. FIG. 5 is a block diagram of one embodiment of the determination section. FIG. 5 is a block diagram of one embodiment of the character string frequency counting section. In the figure, 10 is a control unit, 20 is a scanning unit, 3G is a recognition unit, 4
0 is the operation unit, sO is the context definition form interpretation unit, 6G is the context table storage unit, 70 company determination unit, ’71 is the connection between the control unit and other parts, 72 is the output line, and 80 is the character The column frequency counter is shown. Patent applicant Nippon Telegraph and Telephone Public Corporation Patent attorney Mori 1) Kansai 2 eyes 3 figures 4 A t 5 figures

Claims (1)

【特許請求の範囲】[Claims] (1)  複数字種の文字が、混在して書かれた帳票を
光学的に読取る文字読取方式において、1つの入力文字
に対し、複数の候補文字の=−ドを出・力できる認識手
段と、読取対象帳票における2文字列の並びに関する情
報を記述した前後関係定義帳票を読取シ、前後する任意
の2文字の対が互いに隣接することが許されるか否かを
表わす前後関係テーブルを作成する手段と、該前後関係
テーブルの情報を基に入力文字に対する候補文字の中か
ら前あるいは後の文字と隣接可能な文字を選択する手段
とを有することを**とする文字判定処理方式。 (2、特許請求の範四第1項記載の文字判定処理方式に
おいて、前後関係テーブルを作成する手段は、入力帳票
上の文字を読取ると同時にその読取結果から帳票上に出
現する2文字−1の出現頻度を計数し、該2文字列の出
現#JRによ)前後関係テーブルの内容を書き換える手
段を有することを特徴と、する文字判定処理方式。
(1) In a character reading method that optically reads forms written in a mixture of characters of multiple character types, a recognition means capable of outputting multiple candidate characters of =- for one input character. , reads the context definition form that describes the arrangement of two character strings in the document to be read, and creates a context table that indicates whether or not pairs of two consecutive characters are allowed to be adjacent to each other. and means for selecting a character that can be adjacent to a previous or subsequent character from among candidate characters for an input character based on information in the context table. (2. In the character determination processing method recited in claim 4, item 1, the means for creating a context table reads the characters on the input form and at the same time, from the reading result, the two characters appearing on the form - 1 A character determination processing method comprising means for counting the appearance frequency of the two character strings and rewriting the contents of a context table (according to the appearance #JR) of the two character strings.
JP56145897A 1981-09-16 1981-09-16 Character decision processing system Pending JPS5848180A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP56145897A JPS5848180A (en) 1981-09-16 1981-09-16 Character decision processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP56145897A JPS5848180A (en) 1981-09-16 1981-09-16 Character decision processing system

Publications (1)

Publication Number Publication Date
JPS5848180A true JPS5848180A (en) 1983-03-22

Family

ID=15395576

Family Applications (1)

Application Number Title Priority Date Filing Date
JP56145897A Pending JPS5848180A (en) 1981-09-16 1981-09-16 Character decision processing system

Country Status (1)

Country Link
JP (1) JPS5848180A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61144994A (en) * 1984-12-18 1986-07-02 Atena Syst:Kk Hair creation method
JPS62182982A (en) * 1986-02-07 1987-08-11 Nippon Telegr & Teleph Corp <Ntt> Automatic detection system for wrong word in japanese text
JPS63188011U (en) * 1987-05-27 1988-12-01
WO2002001984A1 (en) 2000-07-03 2002-01-10 Phild Co., Ltd Hair design system and its applications
US9173743B2 (en) 2009-07-01 2015-11-03 Biomet Uk Limited Method of implanting a unicondylar knee prosthesis

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61144994A (en) * 1984-12-18 1986-07-02 Atena Syst:Kk Hair creation method
JPH0316073B2 (en) * 1984-12-18 1991-03-04 Atena Shisutemu Kk
JPS62182982A (en) * 1986-02-07 1987-08-11 Nippon Telegr & Teleph Corp <Ntt> Automatic detection system for wrong word in japanese text
JPS63188011U (en) * 1987-05-27 1988-12-01
JPH0431055Y2 (en) * 1987-05-27 1992-07-27
WO2002001984A1 (en) 2000-07-03 2002-01-10 Phild Co., Ltd Hair design system and its applications
US9173743B2 (en) 2009-07-01 2015-11-03 Biomet Uk Limited Method of implanting a unicondylar knee prosthesis

Similar Documents

Publication Publication Date Title
US3995254A (en) Digital reference matrix for word verification
US3852720A (en) Method and apparatus for automatically generating korean character fonts
EP0274426A2 (en) Computer memory system
US3763467A (en) Method and apparatus for reading documents
US3839702A (en) Bayesian online numeric discriminant
JPS5848180A (en) Character decision processing system
US3737852A (en) Pattern recognition systems using associative memories
KR940002474B1 (en) Processing method of english graphic code in korean character and apparatus therefor
JPS5882373A (en) Online character recognizing method
JPS6174080A (en) Display system of character recognizing device
JPS58125183A (en) Method for displaying unrecognizable character in optical character reader
JP3924899B2 (en) Text search apparatus and text search method
JPS5949628B2 (en) optical character reader
JPS5814710B2 (en) pattern classification device
JPS5958971A (en) Coding circuit
JP3310063B2 (en) Document processing device
JPH08202811A (en) Character reader
JPS6029823A (en) Adaptive symbol string conversion method
JPS6128276A (en) Decoding circuit
JPS5935466B2 (en) Character reading method and device
JPH0863487A (en) Method and device for document retrieval
JPS58165180A (en) Character certifying device
JPS61248169A (en) Picture information processer
JPH02150980A (en) How to recognize characters and words
JPH05314304A (en) Character recognition device