JPH0210472B2

JPH0210472B2 -

Info

Publication number: JPH0210472B2
Application number: JP57222489A
Authority: JP
Inventors: Sueji Myahara
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1982-12-18
Filing date: 1982-12-18
Publication date: 1990-03-08
Also published as: JPS59112367A

Description

【発明の詳細な説明】本発明は文字読取方式、特に文字ピツチが一定
でない文書の文字を精度よくかつ高速に読取るこ
とのできる文字読取方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character reading system, and more particularly to a character reading system that can read characters in a document whose character pitch is not constant with high precision and at high speed.

従来の文字読取装置では第１図に示すように走
査・光電変換した帳票上の文字パターンを信号入
力端子１を介してパターンメモリ２に一旦格納
し、該パターンメモリ２上の文字行に対し文字切
出し部３により文字列の先頭から予め定められた
一定間隔で一文字ずつ切出し、その文字パターン
を識別部４へ転送してそれが何という文字である
かを判定し、更にその識別結果を出力端子５より
出力するようになしていた。このため帳票上の文
字ピツチが一定でないとまつたく読取れないとい
う欠点があつた。 In the conventional character reading device, as shown in FIG. The cutting unit 3 cuts out one character at a predetermined interval from the beginning of the character string, transfers the character pattern to the identification unit 4 to determine what character it is, and outputs the identification result to the output terminal. It was designed to output from 5. For this reason, there was a drawback that unless the character pitch on the form was constant, it could not be read accurately.

また上記文字読取装置の文字切出し部３に補助
識別部６を設け、幅の狭いパターンのみを該補助
識別部６にて一文字であるかあるいは文字の一部
であるかを判定し、一文字のパターンと判定され
た場合には該幅の狭いパターンを一文字とみなし
て文字切出しを行ない、また文字の一部と判定さ
れれば次のパターンに加えて文字切出しを行なう
ようにして文字ピツチが一定でない帳票より文字
を読取れるようにしたものは既に提案されてい
る。 Further, an auxiliary identification section 6 is provided in the character cutting section 3 of the character reading device, and the auxiliary identification section 6 determines whether only a narrow pattern is one character or a part of a character, and the pattern of one character is If it is determined that the narrow pattern is one character, the character is extracted, and if it is determined to be part of a character, the character is extracted in addition to the next pattern, so that the character pitch is not constant. Devices that allow characters to be read from forms have already been proposed.

しかしながら上記装置では幅の狭いパターンが
他の文字パターンと同様なパターンを成す（例え
ば横書きの帳票において漢字「化」のようにその
偏「〓」がカタカナの「イ」と同じ字形を成すよ
うな）場合や、文字に切れが生じて一文字が多く
のパターンに分かれたり、連続して書かれた２以
上の文字が一文字として判定される場合等におい
て十分な切出し精度を得ることが困難であつた。
また補助識別部６が動作する場合は、その判定結
果が出されるまで文字切出し部３の動作が停止す
るため読取速度が遅くなるという欠点があつた。
また、この読取速度の低下は補助識別部６で対処
しなければならないパターンが増える程、顕著に
なる欠点があつた。 However, in the above device, the narrow pattern forms a pattern similar to other character patterns (for example, in a horizontally written form, the kanji character ``ka'', whose partial ``〓'' forms the same character shape as the katakana ``i'' ), when a character is cut into many patterns, or when two or more characters written consecutively are judged as one character, it is difficult to obtain sufficient extraction accuracy. .
Further, when the auxiliary identification section 6 operates, the operation of the character cutting section 3 is stopped until the determination result is output, so that there is a drawback that the reading speed becomes slow.
Moreover, this reduction in reading speed becomes more noticeable as the number of patterns that the auxiliary identification section 6 has to deal with increases.

本発明は上記従来の欠点を除去するため、帳票
上の文字行において予め定められた一定区間内に
複数のパターンが存在する場合は該パターンを順
次組合せて文字切出しを行ない、それらをすべて
文字識別し、その結果から、より文字らしいもの
を選択するようになしたもので、その目的とする
ところは文字ピツチの一定でない帳票等の文書の
文字を精度をよくかつ高速に読取ることのできる
文字読取方式を提供することにある。以下図面に
ついて詳細に説明する。 In order to eliminate the above-mentioned conventional drawbacks, the present invention, when a plurality of patterns exist within a predetermined interval in a character line on a form, performs character extraction by sequentially combining the patterns, and identifies all of them. Based on the results, we selected those that looked more like characters, and the purpose was to develop a character reader that can read characters in documents such as forms with variable character pitch with high precision and at high speed. The goal is to provide a method. The drawings will be explained in detail below.

第３図乃至第８図は本発明の一実施例を示すも
ので、図中、１１は入力端子、１２はパターンメ
モリ、１３は文字切出し部、１４は識別部、１５
は文字決定部、１６は出力端子である。 3 to 8 show an embodiment of the present invention, in which 11 is an input terminal, 12 is a pattern memory, 13 is a character cutting section, 14 is an identification section, 15
1 is a character determining section, and 16 is an output terminal.

これを動作するには、まず帳票上の文字を光電
変換装置（図示せず）によりパターンデータに変
換し、これを入力端子１１を介してパターンメモ
リ１２に一旦蓄える。文字切出し部１３は該パタ
ーンメモリ１２より第４図に示すような一行分の
文字を含む行パターン２０を切出し、これを行方
向（図中、矢印Ｘ方向）に走査していき列方向
（図中、矢印Ｙ方向）にパターンの存在する部分
を黒（２進符号では「１」）、存在しない部分を白
（２進符号では「０」）で表示したデータ（以下、
これを黒列データと称す。）３０を取り出す。更
に該文字切出し部１２は黒列データ３０に基づい
て後述する処理を実行し行パターン２０より個別
パターン２１を切出し、個々の個別パターン２１
とその切出しに関する情報とを一対の個別データ
として識別部１４に順次送出する。識別部１４は
上記個別データのうち個別パターン２１のみを順
次文字識別し、その識別結果（例えば文字コー
ド）と上記切出しに関する情報とを一対のデータ
として文字決定部１５に順次送出する。文字決定
部１５は該データに後述する処理を施こし正しい
文字の識別結果のみを出力端子１６より順次出力
する。 To operate this, first, characters on a form are converted into pattern data by a photoelectric conversion device (not shown), and this data is temporarily stored in a pattern memory 12 via an input terminal 11. The character cutting unit 13 cuts out a line pattern 20 including one line of characters as shown in FIG. 4 from the pattern memory 12, scans it in the row direction (direction of arrow Data (hereinafter referred to as "data") where the part where the pattern exists (in the direction of arrow Y) is shown in black ("1" in binary code), and the part where the pattern does not exist is shown in white ("0" in binary code).
This is called black column data. ) Take out 30. Furthermore, the character cutting unit 12 executes a process to be described later based on the black column data 30 to cut out individual patterns 21 from the row pattern 20.
and information regarding the extraction are sequentially sent to the identification unit 14 as a pair of individual data. The identification unit 14 sequentially identifies the characters of only the individual patterns 21 of the individual data, and sequentially sends the identification result (for example, character code) and the information regarding the cutout as a pair of data to the character determination unit 15. The character determination unit 15 performs processing to be described later on the data and sequentially outputs only the correct character identification results from the output terminal 16.

文字切出し部１３における個別パターン２１の
切出しは、黒列データ３０の先頭を開始点（以
下、基準位置と称す。）として予め設定された一
定区間α内に存在する黒の部分の集合（以下、こ
れを黒列の塊りと称す。）の個数を調べ、一個の
場合はその区間を一文字の個別パターンとみな
し、複数個存在する場合は連続する黒列の塊りを
順次一個ずつ増して組合わせた複数個のパターン
をそれぞれ一文字の個別パターンとみなすととも
に該複数個の黒列の塊りのうち先頭の塊りを除い
た位置を次の一定区間αの基準位置とする如くな
つている。次に第５図に示すフローチヤートに従
つて詳細に説明するが、図中DNOは一定区間α
内の黒列の塊り数Ｎを検出するための動作が何回
繰返し生じたかを表わす動作番号、またPNOは
黒列の塊りの組合わせによる個別パターン（以
下、これを組合わせパターンと称す。）を作成す
る際のパターンの順番を示すパターン番号であ
り、該動作番号DNOとパターン番号PNOは切出
しに関する情報を構成する。 The individual pattern 21 is cut out by the character cutting unit 13 by starting from the beginning of the black row data 30 (hereinafter referred to as the reference position) and starting from a set of black parts (hereinafter referred to as "reference position") existing within a certain interval α set in advance. This is called a cluster of black rows.) If it is one, consider that section as an individual pattern of one character, and if there are multiple, consecutive black row clusters are added one by one and combined. The combined plurality of patterns are each regarded as an individual pattern of one character, and the position excluding the first block among the plurality of blocks of black lines is set as the reference position of the next fixed interval α. Next, a detailed explanation will be given according to the flowchart shown in Fig. 5. In the figure, DNO is a certain interval α
PNO is an operation number that indicates how many times the operation to detect the number N of clusters in the black row has occurred, and PNO is an individual pattern (hereinafter referred to as a combination pattern) based on a combination of clusters in the black row. ) is a pattern number indicating the order of patterns when creating the pattern, and the operation number DNO and pattern number PNO constitute information regarding cutting out.

まず、黒列データ３０の先頭すなわち文字切出
しの開始位置から一定区間α内に存在する黒列の
塊りの数Ｎを計数し、Ｎ＝０のときはその区間が
スペースであればDNO＝１、PNO＝１を付与し
てスペースパターンを識別部１４へ送出し、区間
が全て黒列の塊りであれば次の黒列の塊りの終了
の位置を検出し、その区間を接触文字とみなして
強制分離を行ない、それぞれの個別パターンに
DNO＝１、PNO＝１を付与して識別部１４へ送
出する。またＮ＝１のときはその区間が一文字の
個別パターンであるとみなしてその個別パターン
にDNO＝１、PNO＝１を付与して識別部１４へ
送出する。 First, count the number N of black column clusters that exist within a certain interval α from the beginning of the black column data 30, that is, the starting position of character extraction, and when N=0, if the section is a space, DNO=1 , PNO=1 is assigned and the space pattern is sent to the identification unit 14, and if the section is a block of all black columns, the end position of the next block of black columns is detected, and that section is designated as a contact character. Then, force separation is performed on each individual pattern.
It is assigned DNO=1 and PNO=1 and sent to the identification unit 14. When N=1, the section is considered to be an individual pattern of one character, and DNO=1 and PNO=1 are assigned to the individual pattern and sent to the identification section 14.

更にＮ＞１のときは黒列の塊りの出現順序を変
えることなく先頭から現われる黒列の塊りを順次
組合わせ、Ｎ個の組合せパターンを作成し動作番
号DNOと１からＮまでのパターン番号PNOを付
与して識別部１４へ送出する。例えばＮ＝３のと
き、黒列の塊りをａ，ｂ，ｃとすると、最初の処
理ではDNO＝１でPNO＝１のパターン「ａ」、
DNO＝１でPNO＝２のパターン「ab」、DNO＝
１でPNO＝３のパターン「abc」の３個の切出し
に関する情報付きの組合わせパターンを識別部１
４へ送出する。次に先頭の黒列の塊り「ａ」を除
きDNO＝２として処理を繰返し、DNO＝２で
PNO＝１のパターン「ｂ」、DNO＝２でPNO＝
２のパターン「bc」を識別部１４へ送出し、更
に先頭の黒列の塊り「ｂ」を除きDNO＝３とし
て処理を繰返し、DNO＝３でPNO＝１のパター
ン「ｃ」を識別部１４へ送出する如くなつてい
る。 Furthermore, when N>1, the blocks in the black column that appear from the beginning are sequentially combined without changing the order of appearance of the blocks in the black column, N combination patterns are created, and the operation number DNO and the patterns from 1 to N are combined. It is assigned a number PNO and sent to the identification unit 14. For example, when N=3, if the black rows are a, b, c, in the first process, pattern "a" with DNO=1 and PNO=1,
Pattern “ab” with DNO=1 and PNO=2, DNO=
1, the identification unit 1 identifies a combination pattern with information regarding three cutouts of the pattern “abc” with PNO=3.
Send to 4. Next, remove the block “a” in the first black column and repeat the process with DNO=2.
Pattern “b” with PNO=1, PNO= with DNO=2
The pattern "bc" of 2 is sent to the identification unit 14, and the processing is repeated with DNO=3 except for the block "b" in the black column at the beginning, and the pattern "c" of PNO=1 is sent to the identification unit with DNO=3. 14.

文字決定部１５では切出しに関する情報より一
定区間α内のパターンが一文字の個別パターンと
みなされている場合にはその識別結果をそのまま
出力し、複数個のパターンとみなされている場合
にはその複数個の組合わせパターンの各々の識別
結果の中からジエクトを除いて該組合わせパター
ンの内でパターン幅が最も長いものを正しい識別
結果として出力するとともに該区間内の黒列の塊
りをそのパターン内に含む後続の識別結果を排除
する如くなつている。第６図は文字決定部１５で
の処理の詳細を示すフローチヤートで、切出しパ
ターンに関する情報すなわち動作番号DNOとパ
ターン番号PNOから読取結果として出力するた
めの対象区間の組合わせパターンであるか否かを
判定して識別結果をバツフアに格納し格納したバ
ツフアの中から最長の組合わせパターンで識別で
きたものを読取結果として出力する。 In the character determination unit 15, if a pattern within a certain interval α is considered to be an individual pattern of one character based on the information regarding cutting out, the identification result is output as is, and if it is considered to be a plurality of patterns, then the plurality of patterns are output. Among the identification results of each of the combination patterns, the one with the longest pattern width is outputted as the correct identification result by excluding the ``JECT'' combination pattern, and the block of black columns in the section is output as the pattern. It is designed to exclude subsequent identification results contained within. FIG. 6 is a flowchart showing the details of the processing in the character determination unit 15, in which information about the cutting pattern, that is, whether or not it is a combination pattern of target sections to be output as a reading result from the operation number DNO and pattern number PNO. is determined, the identification result is stored in a buffer, and the one that can be identified with the longest combination pattern from the stored buffer is output as a reading result.

次に第４図の行パターン２０を例にとつて文字
切出しと文字決定の過程を説明する。行パターン
２０のうちのパターン「ベ」、「ク」、「ト」につい
てはその黒列データ３０中の一定区間α内におけ
る黒列の塊り数が一個であるから、それぞれ一文
字毎の個別パターン２１として切出され、その識
別結果がそのまま出力端子１６に送出される。次
のパターン「ル」を含む一定区間α（ここでは対
象区間と称す。）には黒列の塊りが２個存在す
るため、文字切出し部１３は該２個のパターンを
順次組合わせた個別パターン「ノ」及び「ル」と
その切出しに関する情報を識別部１４に送出する
とともに、該対象区間における黒列の塊りのう
ちの先頭の塊り「ノ」を除いた位置を次の対象区
間の基準位置として設定する。ここでは該対象
区間においても２個の黒列の塊りが検出され上
記同様に組合わせパターンとその切出しに関する
情報が送出され、以下対象区間，においても
同様となる。識別部１４では対象区間のパター
ン「ノ」に対して『ノ』や『１』などの文字を識
別結果として出力し、パターン「ル」に対して
『ル』の文字を出力する。文字決定部１５ではパ
ターン幅が最も大きくて識別結果の確度が高いも
の、対象区間では『ル』を読取結果として出力
し、同時にパターン「レ」を含む対象区間の識
別結果を排除し、対象区間の識別結果から次の
文字決定を行なう。該対象区間の識別結果から
は『化』の文字が読取結果として出力され、次の
区間は排除される。以下のパターン「を」、
「進」等については上記同様に一文字として読取
られる。第７図は上記説明した第４図の行パター
ンの切出し、識別、文字決定処理の実行のようす
を示したもので、また第８図はその処理の流れを
示したものである。 Next, the process of character extraction and character determination will be explained using the line pattern 20 of FIG. 4 as an example. For the patterns "be", "ku", and "to" in the row pattern 20, the number of clusters of black columns within the fixed interval α in the black column data 30 is one, so each individual pattern for each character is 21, and the identification result is sent to the output terminal 16 as is. Since there are two clusters of black columns in a certain interval α (referred to as the target interval here) that includes the next pattern "ru", the character cutting unit 13 sequentially combines the two patterns The information regarding the patterns "NO" and "RU" and their extraction is sent to the identification unit 14, and the position of the black column in the target section excluding the first block "NO" is determined as the next target section. Set as the reference position. Here, two clusters of black rows are detected in the target section as well, and information regarding the combination pattern and its extraction is transmitted in the same manner as described above, and the same applies to the target section below. The identification unit 14 outputs characters such as "NO" and "1" for the pattern "NO" in the target section as identification results, and outputs the character "RU" for the pattern "RU". The character determining unit 15 outputs the character with the largest pattern width and the most accurate identification result, ``RU'', as the reading result in the target section, and at the same time eliminates the identification results of the target section including the pattern ``LE''. The next character is determined based on the identification result. From the identification result of the target section, the character "ka" is output as a reading result, and the next section is excluded. The following pattern "wo",
"Shin" etc. are read as one character in the same way as above. FIG. 7 shows how the line pattern cutout, identification, and character determination processing of FIG. 4 described above is executed, and FIG. 8 shows the flow of the processing.

このように上記実施例によれば、一定区間α内
の黒列の塊り数に基づいて一文字のパターンか、
そうでないかを区別するようになしたため、一文
字として切出す区間と複数の組合わせパターンを
構成すべき区間とを確実に区別することができ、
また複数個の黒列の塊りが一定区間α内に存在し
た場合は先頭の塊りを除いた位置を次の区間の基
準位置となしたため、考え得る全ての組合わせパ
ターンを取出すことができ、読取精度を上げるこ
とができる。また文字切出し部では黒列の塊り数
に従つて機械的にパターンを切出すのみでよいか
ら従来例の如く補助識別部の識別結果を待つ必要
がなく、この処理全体をパイプライン構成とする
こともでき、処理の高速化がはかれる。 In this way, according to the above embodiment, a single character pattern or
Since it is possible to distinguish between the sections that are to be extracted as one character and the sections that should constitute multiple combination patterns, it is possible to reliably distinguish between
In addition, when multiple black column clusters exist within a certain interval α, the position excluding the first cluster is used as the reference position for the next interval, so all possible combination patterns can be extracted. , reading accuracy can be improved. In addition, since the character cutting section only needs to mechanically cut out the pattern according to the number of blocks in the black row, there is no need to wait for the identification result of the auxiliary identification section as in the conventional case, and this entire process is configured as a pipeline. You can also speed up the processing.

以上説明したように本発明によれば、帳票上の
文章を走査光電変換し得られた文字行のパターン
から一文字ずつ切出して文字認識を行なう文字読
取方式において、文字行上の予め定められた一定
区間内に存在する黒列の塊りの個数を調べ、一個
の場合はその区間を一文字のパターンとみなして
切出し、複数個の場合は該黒列の塊りを順次適宜
に組合わせた複数の組合わせパターンをそれぞれ
一文字のパターンとみなして切出し、該切出した
パターンとその切出しに関する情報を出力する切
出し工程と、該切出したパターンの識別結果とそ
の切出しに関する情報とより一文字のパターンと
みなされている場合はその識別結果をそのまま出
力し、複数個のパターンとみなされている場合は
その複数の組合わせパターンの各々の識別結果の
中から最もパターン幅の長い組合わせパターンに
対応する識別結果を出力する文字決定工程とを有
するため、分離文字や切れが生じた文字を含み文
字ピツチが一定でない文書からの文字切出しを複
雑な識別や判定を必要とすることなく一義的に行
なうことができ処理の高速化がはかれるととも
に、文字の一部が他の文字と同様な場合であつて
も正しく読取ることができ、また複数個の黒列の
塊りが一定区間内に存在する場合に連続する黒列
の塊りを順次一個ずつ増して組合わせたパターン
をそれぞれ一文字のパターンとみなして切出しす
とともに該複数個の黒列の塊りのうち先頭の塊り
を除いた位置を次の一定区間の基準位置となした
ものでは考え得る全ての組合わせパターンを取出
すことができ読取精度を上げることができ、従つ
て読取対象を拡大できる等の利点がある。 As explained above, according to the present invention, in a character reading method in which character recognition is performed by cutting out each character from a character line pattern obtained by scanning and photoelectrically converting text on a document, a predetermined The number of clusters of black columns existing in an interval is checked, and if there is one, the interval is treated as a single character pattern and cut out, and if there are multiple clusters, the clusters of black columns are sequentially combined appropriately. A cutting step in which each of the combination patterns is regarded as a pattern of one character and is cut out, and the cut out pattern and information regarding the cutting out are outputted; If it is considered as multiple patterns, output the identification result as is, and if it is considered as multiple patterns, output the identification result corresponding to the combination pattern with the longest pattern width from among the identification results of each of the multiple combination patterns. Because it has a character determination process to output, it is possible to unambiguously cut out characters from a document that includes separated characters or cut characters and whose character pitch is not constant, without requiring complicated identification or judgment. In addition to speeding up the reading speed, even when some characters are similar to other characters, they can be read correctly. The patterns in which the clusters of columns are sequentially increased one by one and combined are treated as one character pattern and cut out, and the position of the clusters of the plurality of black columns excluding the first cluster is calculated in the next certain interval. The reference position has the advantage that all possible combination patterns can be extracted, the reading accuracy can be improved, and the range of objects to be read can be expanded.

[Brief explanation of drawings]

図面は本発明の説明に供するもので、第１図は
従来の文字読取装置を示すブロツク図、第２図は
従来の他の文字読取装置を示すブロツク図、第３
図は本発明方式を適用した文字読取装置の一実施
例を示すブロツク図、第４図は行パターン及びそ
の黒列データの一例を示す説明図、第５図は文字
切出部１３のフローチヤート、第６図は文字決定
部１５のフローチヤート、第７図は第４図の行パ
ターンに対する切出し、識別、文字決定処理の実
行のようすを示す説明図、第８図は第７図の処理
の流れを示す説明図である。１１…入力端子、１２…パターンメモリ、１３
…文字切出し部、１４…識別部、１５…文字決定
部、１６…出力端子。 The drawings serve to explain the present invention; FIG. 1 is a block diagram showing a conventional character reading device, FIG. 2 is a block diagram showing another conventional character reading device, and FIG. 3 is a block diagram showing another conventional character reading device.
The figure is a block diagram showing one embodiment of a character reading device to which the method of the present invention is applied, FIG. 4 is an explanatory diagram showing an example of a row pattern and its black column data, and FIG. , FIG. 6 is a flowchart of the character determination unit 15, FIG. 7 is an explanatory diagram showing the execution of cutting, identification, and character determination processing for the line pattern of FIG. 4, and FIG. 8 is a flowchart of the processing of FIG. 7. It is an explanatory diagram showing a flow. 11...Input terminal, 12...Pattern memory, 13
...Character cutting section, 14...Identification section, 15...Character determining section, 16...Output terminal.

Claims

[Scope of Claims] 1. In a character reading method that performs character recognition by cutting out each character from a pattern of character lines obtained by scanning and photoelectrically converting text on a document, a character existing within a predetermined interval on a character line Find out the number of blocks in the black column, and if there is one, consider that section as a pattern of one character and cut it out; if there are multiple blocks, create multiple combination patterns by suitably combining the blocks in the black column sequentially. A cutting process in which each pattern is regarded as a single character pattern and is output, and the cutout pattern and information regarding the cutout are output.If the cutout pattern is recognized as a single character pattern, the cutout process is performed. Character determination that outputs the identification result as is, and if it is considered as multiple patterns, outputs the identification result corresponding to the combination pattern with the longest pattern width from among the identification results of each of the multiple combination patterns. A character reading method characterized by having a process. 2 In a character reading method that performs character recognition by cutting out characters one by one from a character line pattern obtained by scanning and photoelectrically converting text on a form, a cluster of black columns that exists within a predetermined interval on a character line is used. If there is one, consider the interval as a pattern of one character and cut it out. If there are multiple, consecutive black rows are sequentially increased by one and the combined patterns are regarded as patterns of one character. A cutting step of cutting out the plurality of black row blocks, and setting a position obtained by removing the first block among the blocks of the plurality of black rows as a reference position for the next certain section, and outputting the cut out pattern and information regarding the cutting out, and the cutting out. If the pattern is considered to be a single character pattern, the identification result is output as is, and if it is considered to be multiple patterns, each of the multiple combination patterns is output. 1. A character reading method comprising: a character determining step of outputting an identification result corresponding to a combination pattern having the longest pattern width from among the identification results.