JP3548234B2

JP3548234B2 - Character recognition method and device

Info

Publication number: JP3548234B2
Application number: JP14763194A
Authority: JP
Inventors: 裕章池田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1994-06-29
Filing date: 1994-06-29
Publication date: 2004-07-28
Anticipated expiration: 2019-07-28
Also published as: JPH0816719A

Description

【０００１】
【産業上の利用分野】
本発明は文書画像より文字単位の画像ブロックの取り出しを行って、ブロック単位で文字認識を行うための文字切り出し方法及びこれを用いた文字認識方法及び装置に関する。
【０００２】
【従来の技術】
一般に、光学式文字認識装置（ＯＣＲ）では、光学的に読み取った文書画像より文字単位の画像ブロックを取り出して文字の切り出しを行い、切り出された各文字毎に文字認識が行われる。一般的なＯＣＲで行なわれる認識処理の一例を図７及び図８を用いて説明する。
【０００３】
図８は一般的なＯＣＲによる文字認識処理の手順を表すフローチャートである。まず、ステップＳ７０１において、イメージスキャナ等を用いて文書画像を入力する。次に、ステップＳ７０２で、入力された文書画像から画像ブロックの取り出しを行なう。
【０００４】
図９は一般的な文字認識処理において用いられる画像ブロックの取り出し方法を説明する図である。図９においては、射影を用いた縦書き文書の画像ブロック取り出しの例が示されており、まず、上下方向に射影８０１を取ることで行を抽出する。そして、行を抽出後、各行に対して行方向と直角方向に射影８０２を取ることで画像ブロック８０３を取り出すことができる。このままでは、２つ以上の文字が接触していることにより複数の文字を含むブロック８０４が存在したり、分離した部分を有する文字について分離されたブロック８０５が存在したりする。
【０００５】
このため、ステップＳ７０３で文字単位の画像ブロック（文字画像ブロック）を作成する。まず、各行で各画像ブロックの高さの平均をとったり、画像ブロックの高さの分布からもっとも頻度の大きい高さを抽出する等して標準文字高を求める。そして、画像ブロックの高さが標準文字高の整数倍であれば、その整数で高さを等分して画像ブロックの分離を行い文字画像ブロックとする。更に、複数の画像ブロックを結合した場合にその高さが標準文字高になるのであればそれらの画像ブロックを結合することで文字画像ブロックを生成する。例えば、図８において、画像ブロック８０４なら２分割して２つの文字画像ブロックが生成され、画像ブロック８０５なら２つの画像ブロックを結合することで１つの文字画像ブロックが生成される。
【０００６】
以上のようにして、ステップＳ７０２とステップＳ７０３により文字画像ブロックを生成することにより文字の切り出しが行われるので、ステップＳ７０４で各文字画像ブロック毎に識別演算を行ない、類似度が最も大きいカテゴリ（文字）を認識結果とするように構成されている。
【０００７】
【発明が解決しようとする課題】
しかしながら、上記の手法による文字の切り出しにおいては、図１０にあるような縦書き文書に含まれる横組みの文字については、９０１のように、それらが１つの文字画像ブロックとして取り出されてしまうため、文字単位に切り出せず、認識不能となったり、誤認識の原因になったりするという問題がある。
【０００８】
本発明はこのような問題に鑑みてなされたものであり、行方向の行中に含まれる文字の切り出し時に、列方向の文字列についても文字の切り出しを行うことを可能とし、文字の認識率を向上する文字切り出し方法とこれを用いた文字認識方法及び装置を提供することを目的とする。
【０００９】
【課題を解決するための手段】
上記の目的を達成するための本発明による文字認識装置は以下の構成を備える。即ち、
文書画像内に含まれる文字画像に対して文字認識処理を行う文字認識装置であって、
前記文書画像の行方向の射影に基づいて行を抽出し、当該抽出された行に直交する列方向の射影に基づいて認識対象文字画像を含む矩形領域を獲得する獲得手段と、
前記獲得手段により獲得された矩形領域毎に、文字認識処理を行って認識結果のカテゴリとその類似度とを得る第１認識手段と、
前記第１認識手段で得た類似度が所定閾値より小さいか否かに基づいて、前記矩形領域を列方向に分割するか否か判断する第１判断手段と、
前記第１認識手段で分割すると判断された矩形領域に対して前記行方向の射影を取ることにより前記列方向の分割を行い、前記第１認識手段で分割しないと判断された矩形領域に対しては前記列方向の分割を行わない分割手段と、
前記分割手段で分割されて得られた矩形領域のそれぞれに対して、文字認識処理を行ってそれぞれの認識結果のカテゴリとその類似度とを得る第２認識手段と、
前記分割されて得られた矩形領域それぞれに対して得られた類似度のうち、少なくとも１つの類似度が所定閾値より小さいか否か判断する第２判断手段と、
前記第１認識手段で分割しないと判断された矩形領域に対しては前記第１認識手段で得たカテゴリを最終結果として出力し、前記第１認識手段で分割すると判断され且つ前記第２認識手段で得た類似度の少なくとも１つが前記所定閾値より小さいと判断された矩形領域に対しては、前記第１認識手段で得たカテゴリを最終結果として出力し、前記第１認識手段で分割すると判断され且つ前記第２認識手段で得た類似度の全てが前記所定閾値以上であると判断された矩形領域に対しては、前記第２認識手段で得たカテゴリを最終結果として出力する出力手段とを有する。
【００１０】
また、上記の目的を達成するための本発明による文字認識方法は、
文書画像内に含まれる文字画像に対して文字認識処理を行う文字認識方法であって、
前記文書画像の行方向の射影に基づいて行を抽出し、当該抽出された行に直交する列方向の射影に基づいて認識対象文字画像を含む矩形領域を獲得する獲得工程と、
前記獲得工程により獲得された矩形領域毎に、文字認識処理を行って認識結果のカテゴリとその類似度とを得る第１認識工程と、
前記第１認識工程で得た類似度が所定閾値より小さいか否かに基づいて、前記矩形領域を列方向に分割するか否か判断する第１判断工程と、
前記第１認識工程で分割すると判断された矩形領域に対して前記行方向の射影を取ることにより前記列方向の分割を行い、前記第１認識工程で分割しないと判断された矩形領域に対しては前記列方向の分割を行わない分割工程と、
前記分割工程で分割されて得られた矩形領域のそれぞれに対して、文字認識処理を行ってそれぞれの認識結果のカテゴリとその類似度とを得る第２認識工程と、
前記分割されて得られた矩形領域それぞれに対して得られた類似度のうち、少なくとも１つの類似度が所定閾値より小さいか否か判断する第２判断工程と、
前記第１認識工程で分割しないと判断された矩形領域に対しては前記第１認識工程で得たカテゴリを最終結果として出力し、前記第１認識工程で分割すると判断され且つ前記第２認識工程で得た類似度の少なくとも１つが前記所定閾値より小さいと判断された矩形領域に対しては、前記第１認識工程で得たカテゴリを最終結果として出力し、前記第１認識工程で分割すると判断され且つ前記第２認識工程で得た類似度の全てが前記所定閾値以上であると判断された矩形領域に対しては、前記第２認識工程で得たカテゴリを最終結果として出力する出力工程とを有する。
【００１８】
尚、本発明において、行方向及び列方向は、縦書き文書の場合は夫々縦方向、横方向に対応する。又、横書き文書にあっては、行方向は横方向、列方向は縦方向に夫々対応する。
【００１９】
【実施例】
以下に添付の図面を参照して本発明の好適な実施例を説明する。
【００２０】
［実施例１］
図１は本発明を実施するための文字認識装置の構成を示すブロック図である。１０１はＣＰＵであり、ＲＯＭ１０２に格納されている制御プログラムに従って本装置における各種の制御を行なう。１０２はＲＯＭであり、ＣＰＵ１０１が実行する制御プログラムや各種データを格納する。後述するフローチャートで示される処理を実現する制御プログラムもこのＲＯＭ１０２に格納されている。１０３はＲＡＭであり、ＣＰＵ１０１が各種の処理を実行する際の作業エリアを提供する。又、ＲＡＭ１０３は、光学的に読み取られた文書画像などを格納する領域も含む。１０４はディスプレイであり、読取画像や、各種認識結果等を表示する。１０５はキーボードであり、各種指示やデータの入力を行う。１０６はイメージスキャナであり、画像を光学的に読み取る。
【００２１】
次に、図１に示す構成の文字認識装置が実行する本実施例の動作について説明する。
【００２２】
図２は、本実施例における文字画像ブロックの切り出しの概要を説明する図である。同図において、２０１は標準的な大きさを有する画像ブロックであり、そのまま文字画像ブロックとなる。２０２は横組み文字に対する画像ブロックであり、標準の文字幅よりも広い横幅（ｗ１）を有する。２０３は文字画像ブロックの標準サイズを表す標準ブロックであり、スキャナ１０６より読み取られた文字画像より切り出した画像ブロックより所定の方法で算出される。この標準ブロック２０３の横幅ｗｓに所定の係数をかけることで横組み文字を含む画像ブロックを識別するための閾値Ｔが得られる。そして、切り出された各画像ブロックの横幅と閾値Ｔとを比較して、Ｔよりも大きい横幅を有する画像ブロックが横組み文字を含む画像ブロックであるとして、横組み文字用の分割処理を行う。例えば図２においては、画像ブロック２０２がｗ１＞Ｔとなり、この画像ブロックについて更に行方向に射影を行い、その結果３つのブロックに再分割される状態が示されている。
【００２３】
次に、図３及び図４を用いて本実施例１の動作を更に詳細に説明する。
【００２４】
図３は、実施例１の文字認識装置による文字認識処理を表すフローチャートである。まず、ステップＳ２０１でイメージスキャナ１０６を用いて文書画像を入力し、得られた画像データをＲＡＭ１０３に格納する。このとき入力画像をディスプレイ１０４に表示するようにしてもよい。
【００２５】
次に、縦書き文書について縦書き用文字切り出しを行ない、その結果をＲＡＭ１０３に格納する。次にステップＳ２０２において、画像ブロックの切り出しを行う。縦書き用文字の切り出しは、従来例で説明したようにステップＳ２０２で縦方向の射影をとって行を抽出し、各行について横方向の射影をとることで画像ブロックを取り出す。但し、このままでは行内に横組み文字が存在した場合、横組み文字の横幅が行の幅となってしまう。従って、本例では、切り出された各画像ブロックについて左右両側の余白部分の切り落しを行い、各画像ブロックの横幅を修正する。
【００２６】
次に、ステップＳ２０３において、文字単位の画像ブロック（文字画像ブロック）を生成する。ここでは、従来技術のところで説明したように、各画像ブロックについて分離、結合を施すことで、１文字単位の画像ブロックを生成する。
【００２７】
続くステップＳ２０４〜ステップＳ２０８の処理により、全文字画像ブロックについて横組み文字か否かの判定を行い、横組み文字と判定された文字画像ブロックについては更に分離を行う。
【００２８】
ステップＳ２０４において、文字画像ブロックを一つずつ注目していき、当該文字画像ブロックが横組み文字を含むか否かを調べる。注目する文字画像ブロックが横組み文字を含むと判定されたならば、その文字画像ブロックに対して横書き文字用の切り出し処理を施すべくステップＳ２０５へ進む。一方、注目している文字画像ブロックが横組み文字を含まないと判定されればステップＳ２０７へ進む。尚、ステップＳ２０４における横組み文字か否かの判定方法については後述する。
【００２９】
横書き文字用の切り出し処理では、まずステップＳ２０５で画像ブロックを取り出す。一般的には、横書き文字用の切り出し処理も、縦書き文字の切り出し処理と同様の手順がとられる。即ち、入力画像に対して左右方向の射影を取って横方向の行を抽出し、その後、各行に対して縦方向に射影を取ることで画像ブロックの切り出しが行なわれる。但し、本実施例の場合は、注目画像ブロックに対しての処理なので、既に横方向の行が抽出されているとして処理を始めてよい。
【００３０】
次に、ステップＳ２０６で文字単位の画像ブロックを生成する。これは、縦書き文字画像における文字画像ブロックの切り出しにおける画像ブロックの「高さ」を、「幅」に置き換えることで、縦書きの場合と同様に行うことができる。但し、縦書き文書の横組み文字は文字数が少ないので、標準文字幅が精度良く求まらないことも多い。従って、本処理（ステップＳ２０６）を割愛して、ステップＳ２０５で得られた画像ブロックをそのまま文字画像ブロックとして用いてもよい。そして、以上の結果をＲＡＭ１０３に格納する。
【００３１】
ステップＳ２０７では注目している文字画像ブロックを次のブロックに更新する。ステップＳ２０８において、すべての文字画像ブロックについてステップＳ２０４の調査を終えたか否かを判定し、終了していなければステップＳ２０４へ戻り上述の処理を繰り返す。
【００３２】
一方、すべてのブロックについてステップＳ２０４の調査が終了していれば、ステップＳ２１０へ進む。ステップＳ２１０では、各文字画像ブロックについて識別演算を行ない、類似度が最も大きいカテゴリ（文字）を認識結果とし、ＲＡＭ１０３に格納する。ここで、認識結果をディスプレイ１０４に表示するようにしてもよい。
【００３３】
次に、ステップＳ２０４で行われる横組み文字かどうかの判定について、図４のフローチャートを用いてさらに詳細な説明を行なう。図４は、文字画像ブロックが横組み文字を含むか否かを判定するための処理手順を表すフローチャートである。
【００３４】
まずステップＳ３０１において全ての文字画像ブロックの幅を求める。次に、ステップＳ３０２において、例えば、行内の文字画像ブロックの幅の平均を求め、標準文字幅を求める。ここで、句読点や記号等、幅の著しく小さいものが含まれると、標準文字幅の精度が低下してしまうので例えば以下の処理を施す。
【００３５】
例えば、先のステップＳ２０３における文字画像ブロックの生成において求めた標準文字高Ｈと、予め経験的に求められた値ｍ及びｎ（０＜ｍ＜ｎ）により、画像ブロック幅ｗがｍＨ＜ｗ＜ｎＨの範囲にある文字画像ブロックを抽出する。そして、これら抽出された文字画像ブロックを用いて平均値の算出を行い、これを標準文字幅とすることにより、標準文字幅の精度を向上することができる。また、文字画像ブロックの幅に関する分布を取り、頻度が最大となるクラスの代表値を標準文字幅としてもよい。ここで、上述のｍ及びｎの値としては、ｍ＝０．７、ｎ＝１．３程度が好ましい。
【００３６】
ステップＳ３０３では、求めた標準文字幅ｗを用いて、各文字画像ブロックが横組み文字を含むか否かを判断するための閾値Ｔを決定する。一般に、縦書き文書内において、横組み文字を含む文字画像ブロックの横幅は標準文字幅ｗｓより大きくなる。従って、例えば、経験的に求めた値ｔ（ｔ＞１）を用いて閾値Ｔ＝ｔ×ｗｓを求めておき、この閾値Ｔと各文字画像ブロックの横幅とを比較することにより、各文字画像ブロックが横組み文字列を含むか否かを判定できる。ここで、上述のｔの値としては、ｔ＝１．２程度が好ましい。
【００３７】
従って、ステップＳ３０４では、各画像ブロックの幅ｗと閾値Ｔを比較する。ここで、ｗ＞Ｔならば、当該文字画像ブロックは横組み文字を含むとしてステップＳ３０５に進む。一方、ｗ≦ＴであればステップＳ３０６へ進み、通常の文字画像ブロックとする。
【００３８】
以上説明したように、本実施例１によれば、縦書き内の横組み文字の文字画像ブロックの幅が通常の文字画像ブロックの幅に比べて大きいことを利用し、横組み文字を含む文字画像ブロックを抽出することが可能となる。更に、その文字画像ブロックに横書き用の文字切り出し処理を行なうことで、縦書き文書の中に含まれる横組み文字についても、文字単位に切り出すことが可能となる。
【００３９】
なお、上記実施例は、本発明を実施するための一構成例であり、各種の応用が可能であることはいうまでもない。例えば、オペレータが認識領域を指定したり、認識結果を修正する操作ができるように構成されていたり、認識領域をオペレータが介在することなく自動的に決定したり、本処理後に誤認識を減少させるための処理が加わるように構成されていてもよい。
【００４０】
更に、閾値Ｔを各文字画像ブロックの高さに応じて変化させるように構成してもよい。上記実施例１では、標準文字高さから求められた標準文字幅により閾値Ｔを１つ決定しているが、全角文字の中に４倍角の文字が混在した場合に、この４倍角文字を横組み文字として認識する可能性がある。従って、閾値Ｔを文字画像ブロックの高さの関数（例えば閾値Ｔ＝ｋ×ｈ、ここでｋは定数、ｈは文字画像ブロックの高さ）とすれば、このような不具合を解消できる。
【００４１】
また、汎用コンピュータに、本発明を実施する処理を行なうプログラムを外部から提供し、ＲＡＭに本装置の制御プログラムを格納するように構成されていてもよいことはいうまでもない。
【００４２】
［実施例２］
次に、実施例２について説明する。上記実施例１では文字画像ブロックの大きさに基づいて横組み文字を含む文字画像ブロックか否かを判定している。本実施例２では、まず各文字画像ブロックについて識別演算を行い、その類似度が所定値よりも小さいカテゴリしか存在しない文字画像ブロックについて、横組み文字としての処理を施す。尚、実施例２の文字認識装置の構成は実施例１（図１）と同様であるのでここでは説明を省略する。
【００４３】
図５は実施例２における文字認識の手順を表すフローチャートである。前記実施例と同様、まずステップＳ４０１で文書画像を入力し、ステップＳ４０２で画像ブロックを取り出し、更にステップＳ４０３で文字単位の画像ブロック（文字画像ブロック）にする。その後ステップＳ４０４で各文字画像ブロックについて識別演算を行なう。
【００４４】
次に、文字画像ブロックの一つ一つについて注目し、各文字画像ブロックについて、横組み文字として文字の切り出し及び識別演算を行なうかどうかの判定を行なう（ステップＳ４０５）。ここで、注目している文字画像ブロックについて横組み文字として再処理を行なうと判定された場合は、ステップＳ４０６へ進む。又、横組み文字としての再処理を実行しない場合はステップＳ４１０へ進む。このステップＳ４０５における判定処理については図６のフローチャートにより後述する。
【００４５】
ステップＳ４０６では、当該文字画像ブロックに対して横書き用の文字切り出し処理を行なって新たに画像ブロックを取り出す。更に、ステップＳ４０７において、ステップＳ４０６で得られた画像ブロックから文字画像ブロックの生成を行う。但し、実施例１でも説明したように、ステップＳ４０７における画像ブロックの分離、結合による文字画像ブロックの生成は、標準文字幅が精度よく求められないので、このステップを省略し、ステップＳ４０６で得られた画像ブロックをそのまま文字画像ブロックとしてもよい。
【００４６】
ステップＳ４０８では、ステップＳ４０６、ステップＳ４０７にて新たに切り出された文字画像ブロックについて識別演算を行う。そして、ステップＳ４０９において、再文字切り出し前の文字画像ブロック（ステップＳ４０２〜ステップＳ４０３で切り出された文字画像ブロック）と再文字切り出し後の文字画像ブロック（ステップＳ４０６〜ステップＳ４０７で切り出された文字画像ブロック）とを比較し、信頼度が大きい方の結果を採用し、これを認識結果とする。このステップＳ４０９における信頼度の判定は図７のフローチャートを用いて後述する。
【００４７】
ステップＳ４１０では、注目文字画像ブロックを次に進める。そして、ステップＳ４１１で全ての文字画像ブロックを調べたか否かを判定し、まだ未処理の文字画像ブロックがあればステップＳ４０５へ戻る。一方、全ての文字画像ブロックについて処理を終了したならば本処理を終了する。
【００４８】
次に、上述のステップＳ４０５における判定方法を説明する。図６は、各文字画像ブロックについて、横組み文字として再処理するか否かを判定する手順を表すフローチャートである。
【００４９】
ステップＳ４０４の識別演算において、全カテゴリと各文字画像ブロックとの類似度を計算し、類似度が最大のカテゴリを識別演算の結果としている。文字の切り出しが誤っている場合のように、類似した文字が存在しない画像を識別しようとした場合、その類似度は、正しく切り出しが行われている場合に比べて、一般的に低くなる。本例ではこの性質を用いて再処理が必要か否かを判定する。ここで、文字切り出しの再処理が必要か否かを判断するための類似度の閾値をＶとする。
【００５０】
まず、ステップＳ５０１において、予め設定されている類似度の閾値Ｖと注目する文字画像ブロックの類似度ｖとを比較する。ここで、ｖ＜ＶならばステップＳ５０３に進み、再度処理を行なう文字画像ブロックであると判定する。また、ｖ≧ＶであればステップＳ５０２へ進む。ステップＳ５０２では、「１０」と「Ю」や「００」と「∞」などのように誤認識しやすい文字を予め記憶した誤認識リストを参照し、ステップＳ４０４による認識結果この誤認識リストに含まれるかどうかを調べる。認識結果が誤認識リストに存在したならばステップＳ５０３に進み、当該文字画像ブロックを再度処理するものと判定する。尚、誤認識リストはＲＯＭ１０２もしくはＲＡＭ１０３に格納されている。
【００５１】
以上のようにして、再び文字切り出し処理を行なうべき文字画像ブロックが選択される。
【００５２】
また、ステップＳ４０９において、再文字切り出し前後の認識結果のどちらを採用するかを決定するが、以下にこの処理について説明する。図７は、文字切り出しの再処理（即ち横書き用の文字切り出し）が行なわれた文字画像ブロックについて、再処理前後のどちらの結果を採用するかを決定する手順を表すフローチャートである。
【００５３】
まず、ステップＳ６０１において、文字切り出しの再処理（ステップＳ４０６〜ステップＳ４０７による横書き用の文字切り出し）により分割された画像ブロックに対する識別演算（ステップＳ４０８）の結果、類似度が予め定めた閾値Ｖより小さいものが存在するか否かを判断する。類似度の小さいものが１つでも存在する場合はステップＳ６０２に進み、再処理される前の元の結果（即ちステップＳ４０４で得られた結果）を最終的な認識結果とする。一方、新たに切り出された全ての文字画像ブロックにおける類似度が閾値Ｖより大きければ、横組みの文字切り出しが成功したとして、ステップＳ６０３で再処理された後の結果（即ちステップＳ４０８で得られた結果）を最終結果とする。
【００５４】
以上説明したように本実施例２によれば、識別演算による類似度を、縦書き内横組み文字のための処理を行なうかどうかの判定に用いることで、縦書き文書の中に含まれる横組み文字部分を認識することが可能となる。さらに、縦書き内の横組文字の処理が終了したあとに、その結果を採用するかどうかを判定することで、誤認識を減少させ、認識精度を向上させる効果がある。
【００５５】
また実施例１で説明した、文字画像ブロックの大きさを用いて再処理を行うか否かの判断を行う処理を本実施例２に加えた構成で実施してもよいことはいうまでもない。
【００５６】
以上説明したように、縦書き文書の認識を行なう場合、縦書き内横組文字を判定し、その部分に対し横書き用の処理を実施することで、これまで正しく行なうことができなかった縦書き内横組み文字の認識を可能にし、ＯＣＲによる入力作業をより正確に行なえ、誤認識や認識不能文字の修正を軽減する効果がある。
【００５７】
尚、上記実施例においては縦書き文書における横組み文字の混在を説明したが、横書き文書における立て組み文字の混在に対しても上述と同様の概念で実現できることはいうまでもない。
【００５８】
尚、本発明は、複数の機器から構成されるシステムに適用しても１つの機器からなる装置に適用しても良い。また、本発明はシステム或いは装置に本発明により規定される処理を実行させるプログラムを供給することによって達成される場合にも適用できることはいうまでもない。
【００５９】
【発明の効果】
以上説明したように、本発明によれば、行方向の行中に含まれる文字の切り出し時に、列方向の文字列についても文字の切り出しを行うことが可能となり、文字の認識率が向上する。
【００６０】
【図面の簡単な説明】
【図１】本発明を実施するための文字認識装置の構成を示すブロック図である。
【図２】本実施例における文字画像ブロックの切り出しの概要を説明する図である。
【図３】実施例１の文字認識装置による文字認識処理を表すフローチャートである。
【図４】文字画像ブロックが横組み文字を含むか否かを判定するための処理手順を表すフローチャートである。
【図５】実施例２における文字認識の手順を表すフローチャートである。
【図６】各文字画像ブロックについて横組み文字として再処理するか否かを判定する手順を表すフローチャートである。
【図７】再処理前後のどちらの認識結果を採用するかを決定する手順を表すフローチャートである。
【図８】一般的なＯＣＲによる文字認識処理の手順を表すフローチャートである。
【図９】一般的な文字認識処理において用いられる画像ブロックの取り出し方法を説明する図である。
【図１０】横組みの文字を含む縦書き文書の例を表す図である。
【符号の説明】
１０１ＣＰＵ
１０２ＲＯＭ
１０３ＲＡＭ
１０４ディスプレイ
１０５キーボード
１０６イメージスキャナ[0001]
[Industrial applications]
The present invention relates to a character segmentation method for extracting an image block in character units from a document image and performing character recognition in block units, and a character recognition method and apparatus using the same.
[0002]
[Prior art]
Generally, in an optical character recognition device (OCR), an image block for each character is extracted from a document image optically read, and characters are cut out, and character recognition is performed for each cut out character. An example of a recognition process performed in a general OCR will be described with reference to FIGS.
[0003]
FIG. 8 is a flowchart showing a procedure of a general OCR character recognition process. First, in step S701, a document image is input using an image scanner or the like. Next, in step S702, image blocks are extracted from the input document image.
[0004]
FIG. 9 is a diagram illustrating a method of extracting an image block used in general character recognition processing. FIG. 9 shows an example of extracting image blocks of a vertically written document using projection. First, rows are extracted by taking projections 801 in the vertical direction. Then, after extracting the rows, the image block 803 can be extracted by taking a projection 802 for each row in a direction perpendicular to the row direction. In this state, there is a block 804 including a plurality of characters due to two or more characters touching, or a separated block 805 for a character having a separated portion.
[0005]
Therefore, in step S703, an image block (character image block) for each character is created. First, a standard character height is obtained by averaging the height of each image block in each line, extracting the most frequent height from the height distribution of the image blocks, and the like. Then, if the height of the image block is an integral multiple of the standard character height, the height is equally divided by the integer to separate the image blocks to obtain a character image block. Further, if the height of a plurality of image blocks becomes a standard character height when the image blocks are combined, a character image block is generated by combining the image blocks. For example, in FIG. 8, two image blocks are generated by dividing the image block 804 into two, and one image block is generated by combining the two image blocks with the image block 805.
[0006]
As described above, since character extraction is performed by generating a character image block in steps S702 and S703, an identification operation is performed for each character image block in step S704, and the category (character) having the highest similarity is determined. ) As the recognition result.
[0007]
[Problems to be solved by the invention]
However, in character extraction by the above-described method, horizontal characters included in a vertically written document as shown in FIG. 10 are extracted as one character image block as in 901. There is a problem that the character cannot be cut out in character units, and the recognition cannot be performed or a recognition error is caused.
[0008]
The present invention has been made in view of such a problem, and when extracting characters included in a line in a row direction, it is possible to extract a character from a character string in a column direction, and the character recognition rate is improved. And a character recognition method and apparatus using the same.
[0009]
[Means for Solving the Problems]
A character recognition device according to the present invention for achieving the above object has the following configuration. That is,
A character recognition device that performs a character recognition process on a character image included in a document image,
An acquisition unit that extracts a row based on a projection in a row direction of the document image and acquires a rectangular area including a recognition target character image based on a projection in a column direction orthogonal to the extracted row;
A first recognizing means for performing a character recognition process for each rectangular area obtained by the obtaining means to obtain a category of a recognition result and a similarity thereof;
First determining means for determining whether to divide the rectangular area in the column direction based on whether or not the similarity obtained by the first recognizing means is smaller than a predetermined threshold value;
The rectangular area determined to be divided by the first recognizing means is divided in the column direction by taking a projection in the row direction, and the rectangular area determined not to be divided by the first recognizing means. Is dividing means that does not perform the division in the column direction,
A second recognition unit that performs a character recognition process on each of the rectangular areas obtained by the division by the division unit and obtains a category of each recognition result and a similarity thereof;
Second determining means for determining whether at least one of the similarities obtained for each of the divided rectangular regions is smaller than a predetermined threshold,
For a rectangular area determined not to be divided by the first recognition means, the category obtained by the first recognition means is output as a final result, and it is determined by the first recognition means to be divided and the second recognition means For at least one of the rectangular areas determined to be smaller than the predetermined threshold value, the category obtained by the first recognizing means is output as a final result, and the first recognizing means determines to divide the area. Output means for outputting, as a final result, the category obtained by the second recognition means for a rectangular area which is determined and all of the similarities obtained by the second recognition means are equal to or larger than the predetermined threshold. Having.
[0010]
Further, the character recognition method according to the present invention for achieving the above object,
A character recognition method for performing a character recognition process on a character image included in a document image,
An acquisition step of extracting a row based on the projection in the row direction of the document image, and acquiring a rectangular area including the recognition target character image based on the projection in the column direction orthogonal to the extracted row,
A first recognition step of performing a character recognition process on each of the rectangular areas obtained in the obtaining step to obtain a category of a recognition result and a similarity thereof;
A first determination step of determining whether to divide the rectangular area in the column direction based on whether the similarity obtained in the first recognition step is smaller than a predetermined threshold value,
The projection in the row direction is performed on the rectangular area determined to be divided in the first recognition step to perform the division in the column direction, and the rectangular area determined not to be divided in the first recognition step is determined. Is a division step in which the division in the column direction is not performed,
A second recognition step of performing a character recognition process on each of the rectangular areas obtained by the division in the division step to obtain a category of each recognition result and a similarity thereof;
A second determining step of determining whether at least one of the similarities obtained for each of the divided rectangular areas is smaller than a predetermined threshold,
For the rectangular area determined not to be divided in the first recognition step, the category obtained in the first recognition step is output as a final result, and it is determined that the division is performed in the first recognition step and the second recognition step is performed. For a rectangular area in which at least one of the degrees of similarity obtained in the above is determined to be smaller than the predetermined threshold, the category obtained in the first recognition step is output as a final result, and it is determined that the category is divided in the first recognition step. An output step of outputting, as a final result, the category obtained in the second recognition step for a rectangular area that has been determined and all of the similarities obtained in the second recognition step are equal to or larger than the predetermined threshold. Having.
[0018]
In the present invention, the row direction and the column direction correspond to the vertical direction and the horizontal direction, respectively, in the case of a vertically written document. In a horizontally written document, the row direction corresponds to the horizontal direction, and the column direction corresponds to the vertical direction.
[0019]
【Example】
Preferred embodiments of the present invention will be described below with reference to the accompanying drawings.
[0020]
[Example 1]
FIG. 1 is a block diagram showing a configuration of a character recognition device for implementing the present invention. A CPU 101 performs various controls in the apparatus according to a control program stored in a ROM 102. A ROM 102 stores a control program executed by the CPU 101 and various data. A control program for realizing the processing shown in the flowchart described later is also stored in the ROM 102. A RAM 103 provides a work area when the CPU 101 executes various processes. The RAM 103 also includes an area for storing an optically read document image or the like. A display 104 displays a read image, various recognition results, and the like. A keyboard 105 is used to input various instructions and data. An image scanner 106 optically reads an image.
[0021]
Next, the operation of this embodiment executed by the character recognition device having the configuration shown in FIG. 1 will be described.
[0022]
FIG. 2 is a diagram illustrating an outline of cutting out a character image block according to the present embodiment. In the figure, reference numeral 201 denotes an image block having a standard size, which is directly used as a character image block. Reference numeral 202 denotes an image block for horizontal text, which has a horizontal width (w1) wider than the standard character width. Reference numeral 203 denotes a standard block representing a standard size of a character image block, which is calculated by a predetermined method from an image block cut out from a character image read by the scanner 106. By multiplying the horizontal width ws of the standard block 203 by a predetermined coefficient, a threshold value T for identifying an image block including horizontal characters is obtained. Then, the horizontal width of each of the extracted image blocks is compared with a threshold value T, and an image block having a horizontal width larger than T is an image block including a horizontal text, and a horizontal text division process is performed. For example, FIG. 2 shows a state in which the image block 202 satisfies w1> T, and the image block is further projected in the row direction, and as a result, is divided into three blocks.
[0023]
Next, the operation of the first embodiment will be described in more detail with reference to FIGS.
[0024]
FIG. 3 is a flowchart illustrating a character recognition process performed by the character recognition device according to the first embodiment. First, in step S201, a document image is input using the image scanner 106, and the obtained image data is stored in the RAM 103. At this time, the input image may be displayed on the display 104.
[0025]
Next, vertical writing characters are extracted from the vertical writing document, and the result is stored in the RAM 103. Next, in step S202, an image block is cut out. As described in the conventional example, vertical characters are cut out to extract lines by taking a vertical projection in step S202, and to take out image blocks by taking a horizontal projection for each line. However, if there is a horizontal character in the line as it is, the horizontal width of the horizontal character becomes the line width. Therefore, in this example, the left and right margins are cut off for each cut-out image block, and the width of each image block is corrected.
[0026]
Next, in step S203, an image block (character image block) for each character is generated. Here, as described in the description of the related art, an image block in units of one character is generated by performing separation and combination for each image block.
[0027]
Through the processing of the subsequent steps S204 to S208, it is determined whether or not all the character image blocks are horizontal characters, and the character image blocks determined to be horizontal characters are further separated.
[0028]
In step S204, the character image blocks are focused on one by one, and it is checked whether the character image blocks include horizontal characters. If it is determined that the target character image block includes horizontal characters, the process proceeds to step S205 to perform a horizontal writing character cutout process on the character image block. On the other hand, if it is determined that the character image block of interest does not include horizontal characters, the process proceeds to step S207. The method for determining whether or not the character is a horizontal character in step S204 will be described later.
[0029]
In the cutout processing for horizontal writing characters, first, an image block is extracted in step S205. Generally, the same processing is performed for the cutout processing for horizontally written characters as for the cutout processing for vertically written characters. That is, horizontal projections are extracted from the input image in the left-right direction to extract horizontal rows, and then image blocks are cut out by projecting each row in the vertical direction. However, in the case of the present embodiment, since the process is for the image block of interest, the process may be started assuming that a row in the horizontal direction has already been extracted.
[0030]
Next, in step S206, an image block for each character is generated. This can be performed in the same manner as in the case of vertical writing by replacing the “height” of the image block in the cutout of the character image block in the vertical writing character image with “width”. However, since the number of characters in a horizontally composed character of a vertically written document is small, the standard character width is often not accurately obtained. Therefore, this processing (step S206) may be omitted and the image block obtained in step S205 may be used as it is as a character image block. Then, the above result is stored in the RAM 103.
[0031]
In step S207, the character image block of interest is updated to the next block. In step S208, it is determined whether or not the investigation in step S204 has been completed for all the character image blocks, and if not completed, the flow returns to step S204 to repeat the above processing.
[0032]
On the other hand, if the investigation in step S204 has been completed for all blocks, the process proceeds to step S210. In step S210, a classification operation is performed for each character image block, and a category (character) having the highest similarity is set as a recognition result and stored in the RAM 103. Here, the recognition result may be displayed on the display 104.
[0033]
Next, the determination of whether or not the character is a horizontal character performed in step S204 will be described in more detail with reference to the flowchart of FIG. FIG. 4 is a flowchart illustrating a processing procedure for determining whether a character image block includes horizontal characters.
[0034]
First, in step S301, the widths of all the character image blocks are obtained. Next, in step S302, for example, the average of the widths of the character image blocks in the line is obtained, and the standard character width is obtained. Here, if punctuation marks, symbols, and the like include extremely small widths, the accuracy of the standard character width is reduced. For example, the following processing is performed.
[0035]
For example, based on the standard character height H obtained in the generation of the character image block in step S203 and the values m and n (0 <m <n) empirically obtained in advance, the image block width w becomes mH <w <. A character image block in the range of nH is extracted. Then, the average value is calculated using the extracted character image blocks, and the average value is used as the standard character width, whereby the accuracy of the standard character width can be improved. Alternatively, a distribution relating to the width of the character image block may be obtained, and the representative value of the class having the highest frequency may be set as the standard character width. Here, the values of m and n described above are preferably about m = 0.7 and n = 1.3.
[0036]
In step S303, using the obtained standard character width w, a threshold value T for determining whether or not each character image block includes horizontal characters is determined. Generally, in a vertically written document, the width of a character image block including horizontal characters is larger than the standard character width ws. Therefore, for example, a threshold value T = t × ws is obtained using an empirically obtained value t (t> 1), and the threshold value T is compared with the width of each character image block to obtain each character image block. It can be determined whether the block includes a horizontal character string. Here, the value of t is preferably about t = 1.2.
[0037]
Therefore, in step S304, the width w of each image block is compared with the threshold T. Here, if w> T, it is determined that the character image block includes horizontal characters, and the process proceeds to step S305. On the other hand, if w ≦ T, the process proceeds to step S306, where a normal character image block is set.
[0038]
As described above, according to the first embodiment, by utilizing the fact that the width of the character image block of the horizontal character in the vertical writing is larger than the width of the normal character image block, the character including the horizontal character is used. Image blocks can be extracted. Further, by performing the character extraction process for horizontal writing on the character image block, it becomes possible to extract horizontal characters included in a vertically written document in units of characters.
[0039]
The above embodiment is one configuration example for implementing the present invention, and it goes without saying that various applications are possible. For example, it is configured such that an operator can specify a recognition area, perform an operation of correcting a recognition result, automatically determine a recognition area without intervention of an operator, and reduce erroneous recognition after this processing. May be added.
[0040]
Further, the threshold value T may be changed according to the height of each character image block. In the first embodiment, one threshold value T is determined based on the standard character width obtained from the standard character height. However, when quadruple-width characters are mixed in full-width characters, the quadruple-width characters are displayed horizontally. It may be recognized as a kumimoji. Therefore, if the threshold T is a function of the height of the character image block (for example, threshold T = k × h, where k is a constant and h is the height of the character image block), such a problem can be solved.
[0041]
Further, it goes without saying that a program for performing the processing for carrying out the present invention may be externally provided to a general-purpose computer, and the control program of the present apparatus may be stored in the RAM.
[0042]
[Example 2]
Next, a second embodiment will be described. In the first embodiment, it is determined based on the size of the character image block whether or not the image is a character image block including horizontal characters. In the second embodiment, first, an identification operation is performed on each character image block, and a character image block in which the similarity is smaller than a predetermined value in only a category is processed as horizontal text. The configuration of the character recognition device according to the second embodiment is the same as that of the first embodiment (FIG. 1), and a description thereof will not be repeated.
[0043]
FIG. 5 is a flowchart illustrating a procedure of character recognition in the second embodiment. As in the above-described embodiment, first, a document image is input in step S401, an image block is extracted in step S402, and an image block (character image block) is created in character units in step S403. Thereafter, in step S404, an identification operation is performed for each character image block.
[0044]
Next, attention is paid to each of the character image blocks, and for each character image block, a determination is made as to whether or not to perform character cutout and identification calculation as horizontal characters (step S405). Here, if it is determined that the character image block of interest is to be reprocessed as a horizontally composed character, the process proceeds to step S406. On the other hand, if reprocessing for horizontal characters is not performed, the process proceeds to step S410. The determination processing in step S405 will be described later with reference to the flowchart of FIG.
[0045]
In step S406, the character image block for horizontal writing is subjected to the character image block to newly extract an image block. Further, in step S407, a character image block is generated from the image block obtained in step S406. However, as described in the first embodiment, in the generation of a character image block by separation and combination of image blocks in step S407, since the standard character width cannot be obtained with high accuracy, this step is omitted, and is obtained in step S406. The image block may be used as it is as a character image block.
[0046]
In step S408, an identification operation is performed on the character image block newly cut out in steps S406 and S407. In step S409, the character image block before character re-cutout (the character image block cut out in steps S402 to S403) and the character image block after re-character cut-out (character image block cut out in steps S406 to S407) ), The result with the higher reliability is adopted, and this is used as the recognition result. The determination of the reliability in step S409 will be described later with reference to the flowchart of FIG.
[0047]
In step S410, the target character image block is advanced to the next. Then, it is determined whether or not all the character image blocks have been checked in step S411, and if there is any unprocessed character image block, the process returns to step S405. On the other hand, when the processing has been completed for all the character image blocks, this processing is completed.
[0048]
Next, the determination method in step S405 described above will be described. FIG. 6 is a flowchart showing a procedure for determining whether or not each character image block is to be reprocessed as a horizontal character.
[0049]
In the identification operation in step S404, the similarity between all the categories and each character image block is calculated, and the category having the highest similarity is determined as the result of the identification operation. When trying to identify an image in which similar characters do not exist, such as when characters are cut out incorrectly, the degree of similarity is generally lower than when images are cut out correctly. In this example, it is determined whether reprocessing is necessary using this property. Here, let V be a threshold value of the similarity for determining whether or not the character cutout needs to be reprocessed.
[0050]
First, in step S501, the preset similarity threshold V is compared with the similarity v of the target character image block. If v <V, the process advances to step S503 to determine that the block is a character image block to be processed again. If v ≧ V, the process proceeds to step S502. In step S502, a misrecognition list in which characters that are easily misrecognized such as "10" and "$" or "00" and "@" are stored in advance, and the recognition result in step S404 is included in the misrecognition list. Find out if you can. If the recognition result exists in the misrecognition list, the process proceeds to step S503, and it is determined that the character image block is to be processed again. The erroneous recognition list is stored in the ROM 102 or the RAM 103.
[0051]
As described above, the character image block to be subjected to the character cutout processing is selected again.
[0052]
In step S409, it is determined which of the recognition results before and after re-character segmentation is to be used. This processing will be described below. FIG. 7 is a flowchart illustrating a procedure for determining which of the results before and after the re-processing is to be adopted for a character image block that has been re-processed for character extraction (that is, character extraction for horizontal writing).
[0053]
First, in step S601, the similarity is smaller than a predetermined threshold V as a result of the identification operation (step S408) for the image block divided by the re-processing of character extraction (character extraction for horizontal writing in steps S406 to S407). Determine if something exists. If there is even one with a low similarity, the process proceeds to step S602, and the original result before reprocessing (that is, the result obtained in step S404) is set as the final recognition result. On the other hand, if the similarities in all the newly extracted character image blocks are larger than the threshold value V, it is determined that the horizontal character extraction is successful, and the result after re-processing in step S603 (that is, the result obtained in step S408). Result) is the final result.
[0054]
As described above, according to the second embodiment, the similarity obtained by the identification operation is used to determine whether or not to perform the processing for the vertically written horizontal characters. It becomes possible to recognize the kumi character portion. Further, after the processing of the horizontal writing characters in the vertical writing is completed, it is determined whether or not to use the result, thereby reducing erroneous recognition and improving recognition accuracy.
[0055]
Further, it goes without saying that the process of determining whether or not to perform reprocessing using the size of the character image block described in the first embodiment may be performed with a configuration added to the second embodiment. .
[0056]
As described above, when recognizing a vertically written document, vertical writing that cannot be correctly performed until now can be performed by determining horizontal writing characters in vertical writing and performing processing for horizontal writing on the portion. It has the effect of enabling recognition of horizontal characters in a horizontal direction, making it possible to perform an input operation by OCR more accurately, and reducing erroneous recognition and correction of characters that cannot be recognized.
[0057]
In the above embodiment, the mixture of horizontal characters in a vertically written document has been described. However, it is needless to say that the mixed concept of vertical characters in a horizontally written document can be realized by the same concept as described above.
[0058]
The present invention may be applied to a system including a plurality of devices or an apparatus including a single device. Needless to say, the present invention can also be applied to a case where the present invention is achieved by supplying a program for causing a system or an apparatus to execute a process defined by the present invention.
[0059]
【The invention's effect】
As described above, according to the present invention, when characters included in a line in the row direction are extracted, characters can be extracted from a character string in the column direction, and the character recognition rate is improved.
[0060]
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a character recognition device for implementing the present invention.
FIG. 2 is a diagram illustrating an outline of cutting out a character image block according to the embodiment.
FIG. 3 is a flowchart illustrating a character recognition process performed by the character recognition device according to the first embodiment.
FIG. 4 is a flowchart illustrating a processing procedure for determining whether or not a character image block includes a horizontal composition character.
FIG. 5 is a flowchart illustrating a procedure of character recognition according to the second embodiment.
FIG. 6 is a flowchart illustrating a procedure for determining whether or not each character image block is to be reprocessed as a horizontal character.
FIG. 7 is a flowchart illustrating a procedure for determining which recognition result before or after reprocessing is to be adopted.
FIG. 8 is a flowchart illustrating a procedure of a general OCR character recognition process.
FIG. 9 is a diagram illustrating a method of extracting image blocks used in general character recognition processing.
FIG. 10 is a diagram illustrating an example of a vertically written document including horizontal characters.
[Explanation of symbols]
101 CPU
102 ROM
103 RAM
104 display 105 keyboard 106 image scanner

Claims

A character recognition device that performs a character recognition process on a character image included in a document image,
An acquisition unit that extracts a row based on a projection in a row direction of the document image and acquires a rectangular area including a recognition target character image based on a projection in a column direction orthogonal to the extracted row;
A first recognizing means for performing a character recognition process for each rectangular area obtained by the obtaining means to obtain a category of a recognition result and a similarity thereof;
First determining means for determining whether to divide the rectangular area in the column direction based on whether or not the similarity obtained by the first recognizing means is smaller than a predetermined threshold value;
The rectangular area determined to be divided by the first recognition means is divided in the column direction by projecting in the row direction, and the rectangular area determined not to be divided by the first recognition means is calculated. Is dividing means that does not perform the division in the column direction,
A second recognition unit that performs a character recognition process on each of the rectangular areas obtained by the division by the division unit and obtains a category of each recognition result and a similarity thereof;
Second determining means for determining whether at least one of the similarities obtained for each of the divided rectangular regions is smaller than a predetermined threshold,
For a rectangular area determined not to be divided by the first recognition means, the category obtained by the first recognition means is output as a final result, and it is determined by the first recognition means to be divided and the second recognition means For at least one of the rectangular areas determined to be smaller than the predetermined threshold value, the category obtained by the first recognizing means is output as a final result, and the first recognizing means determines to divide the area. Output means for outputting, as a final result, the category obtained by the second recognition means for a rectangular area which is determined and all of the similarities obtained by the second recognition means are equal to or larger than the predetermined threshold. A character recognition device comprising:

A character recognition method for performing a character recognition process on a character image included in a document image,
An acquisition step of extracting a row based on the projection in the row direction of the document image, and acquiring a rectangular area including the recognition target character image based on the projection in the column direction orthogonal to the extracted row,
A first recognition step of performing a character recognition process on each of the rectangular areas obtained in the obtaining step to obtain a category of a recognition result and a similarity thereof;
A first determination step of determining whether to divide the rectangular area in the column direction based on whether the similarity obtained in the first recognition step is smaller than a predetermined threshold value,
The rectangular area determined to be divided in the first recognition step is divided in the column direction by taking the projection in the row direction, and the rectangular area determined not to be divided in the first recognition step is determined. Is a division step in which the division in the column direction is not performed,
A second recognition step of performing a character recognition process on each of the rectangular areas obtained by the division in the division step to obtain a category of each recognition result and a similarity thereof;
A second determining step of determining whether at least one of the similarities obtained for each of the divided rectangular areas is smaller than a predetermined threshold,
For the rectangular area determined not to be divided in the first recognition step, the category obtained in the first recognition step is output as a final result, and it is determined that the division is performed in the first recognition step and the second recognition step is performed. For a rectangular area in which at least one of the degrees of similarity obtained in the above is determined to be smaller than the predetermined threshold, the category obtained in the first recognition step is output as a final result, and it is determined that the category is divided in the first recognition step. An output step of outputting, as a final result, the category obtained in the second recognition step for a rectangular area that has been determined and all of the similarities obtained in the second recognition step are equal to or larger than the predetermined threshold. Character recognition method characterized by having.