JP4366011B2

JP4366011B2 - Document processing apparatus and method

Info

Publication number: JP4366011B2
Application number: JP2000388887A
Authority: JP
Inventors: 知俊金津
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-12-21
Filing date: 2000-12-21
Publication date: 2009-11-18
Anticipated expiration: 2020-12-21
Also published as: JP2002190957A; US7170647B2; US20020085243A1

Description

【０００１】
【発明の属する技術分野】
本発明は、文書の電子化処理を行うための文書処理装置及び方法に関し、特に文書の電子化処理に際して実行される像域分離処理に関する。
【０００２】
【従来の技術】
近年、情報の電子化が進み、文書を紙ではなく電子化して保存、あるいは送信するという需要が高まっている。特に、蓄積メディアの低価格化、および通信帯域の増大により、電子化の対象となる文書は白黒２値のものから、フルカラーの文書へと広がりつつある。
【０００３】
ここでいう文書の電子化とは、単に紙文書をスキャナなどにより光電変換し、画像データ化することのみにとどまらず、記載されている内容を認識して、文書を構成するテキスト、記号、図、写真、表などそれぞれ性質の異なる領域に分割し、文字部は文字コード情報、図はベクトルデータ、写真は画像データ、表は構造データ、といったように各々最も適した形態でデータ化する処理のことを指す。
【０００４】
そのような文書の電子化処理の第１段階として行われるのが、１頁の文書画像に書かれた内容を分析し、文字や図、写真、表など異なる性質の部分要素に分割する処理、すなわち領域分割処理である。図２５は領域分割の一例を示す図である。
【０００５】
このような領域分割処理の実現例としては、米国特許第５６８０４７８号の“Methodand Apparatus for character recognition”（Shin-Ywangら／CanonK.K）などが挙げられる。この例では、文書画像中の黒画素の８連結輪郭塊、白画素の４連結輪郭塊の集合を抽出し、その形状、大きさ、集合状態などから、文字領域、絵や図、表、枠、線といった文書に特徴的な領域を抽出している。図２５の例では、文字領域（ブロック１、３、４、６）、絵や図領域（ブロック２）、表領域（ブロック５）、枠、線（７）といった文書に特徴的な領域を抽出している。
【０００６】
ここで、黒画素の８連結輪郭塊（以下、黒画素塊）とは、図１４のように、ある黒画素から８方向のいずれかで連結している黒画素の集合体である。また、白画素の４連結輪郭塊（以下、白画素塊）とは、図１６のようにある白画素から４方向のいずれかで連結している白画素の集合体を指す。
【０００７】
上述の領域分割処理は、その動作原理より、入力となる文書画像が白黒２値であることが前提となる。従って、この技術を利用してカラー文書の領域分割を行うためには、あらかじめ文書画像の２値化を行う必要がある。一般にカラー画像の２値化は、画素の輝度分布から閾値を求め、画像の各画素を、前出の輝度閾値を境に白又は黒の画素へと変換することで行われる。
【０００８】
【発明が解決しようとする課題】
カラー画像の２値化のための閾値の求め方には、全画面単一で求める方法と、領域毎に求める方法が考えられる。本出願人による特願平１１−２３８５８１号にて提案されている２値化方法では、入力原稿の内容に応じて、領域毎に動的に最適な閾値を求め、これを用いることで領域毎に最適な２値化が行われる。特に低輝度下地上の高輝度文字と、高輝度下地上の低輝度文字が混在するカラー画像から、それらすべてが白地上の黒文字に自動的に変換されるような２値化を可能とし、領域分割処理の入力として最適な２値画像を得ることが可能である。
【０００９】
図２４は先に提案された２値化手法によって色付きの下地を含む文書に領域分離を行った様子を説明する図である。図２４において、カラーの文書２３０１は、下半分に濃い色付の下地領域があり、その上に薄い色の文字が載っているものとし、それ以外は薄い色の下地に濃い色の文字となっているものとする。この様な文書においては、上半分と下半分は意味的に分離しているであろうことが理解できる。
【００１０】
文書２３０１のようなカラー文書に対して、前述の２値化方式で２値化を行うと、図２４の２３０２で示されるような２値画像が生成される。２値画像２３０２では、下地は除去されてすべて白画素になり、文字はすべて黒画素になっている。このとき、２値画像２３０２に対し従来通り領域分割処理を行うと、図２４中２３０３のような結果が得られる、画面の下半分に存在した下地付の領域の情報が欠落しているために、本来ＴＥＸＴ１およびＴＥＸＴ２はそれぞれ中央で２つに分離すべきなのにもかかわらず、結合してしまっている。
【００１１】
つまり、本来カラー画像の持っている下地色による文字領域の範囲指定情報は２値化の際に失われてしまう。
【００１２】
本発明は上記の問題に鑑みてなされたものであり、色によって表された領域の区別を維持した領域分割を可能にすることを目的とする。
【００１３】
【課題を解決するための手段】
上記の目的を達成するための本発明の一態様による文書処理装置は、
輝度画像を２値化するための閾値を複数決定する第１決定手段と、
前記第１決定手段で決定された複数の閾値それぞれを用いて前記輝度画像を２値化することにより、複数の２値画像を取得する取得手段と、
前記取得手段で取得した複数の２値画像それぞれに含まれる黒画素の連結成分の領域に基づいて、下地の輝度が異なる各領域の位置と大きさとを示す領域情報を生成する生成手段と、
前記輝度画像における前記領域情報で示される各領域について、輝度値の反転を行うか否かを決定する第２決定手段と、
前記輝度画像における前記領域情報で示される各領域について、当該各領域を２値化するための閾値をそれぞれ設定する設定手段と、
前記輝度画像における前記領域情報で示される各領域について、前記第２決定手段で反転を行うと決定された領域に関しては、輝度値を反転した前記輝度画像内の対応する領域を、前記設定手段で当該領域に対して設定された閾値で２値化することにより２値部分画像を取得し、一方、前記第２決定手段で反転を行わないと決定された領域に関しては、輝度値を反転していない前記輝度画像内の対応する領域を、前記設定手段で当該領域に対して設定された閾値で２値化することにより２値部分画像を取得する２値部分画像取得手段と、
前記輝度画像の全面を所定閾値で２値化して得られる２値画像における対応する領域に、前記２値部分画像取得手段で取得した各領域の２値部分画像を上書きすることにより、前記輝度画像に関する第２の２値画像を得る２値化手段と、
前記２値化手段で得られた第２の２値画像から文書要素を抽出し、前記生成手段で生成された領域情報で示される各領域を枠の要素とし、前記文書要素と枠の要素とに基づいて木構造を得て、当該木構造に基づいて前記輝度画像の領域分割を行う分割処理手段とを備え、
前記分割処理手段で得る木構造は、前記抽出した文書要素のうち前記領域情報で示される各領域に含まれる文書要素を当該各領域に対応する枠の要素の子孫にした木構造である。
【００１４】
また、上記の目的を達成するための本発明の文書処理方法は、
第１決定手段が、輝度画像を２値化するための閾値を複数決定する第１決定工程と、
取得手段が、前記第１決定工程で決定された複数の閾値それぞれを用いて前記輝度画像を２値化することにより、複数の２値画像を取得する取得工程と、
生成手段が、前記取得工程で取得した複数の２値画像それぞれに含まれる黒画素の連結成分の領域に基づいて、下地の輝度が異なる各領域の位置と大きさとを示す領域情報を生成する生成工程と、
第２決定手段が、前記輝度画像における前記領域情報で示される各領域について、輝度値の反転を行うか否かを決定する第２決定工程と、
設定手段が、前記輝度画像における前記領域情報で示される各領域について、当該各領域を２値化するための閾値をそれぞれ設定する設定工程と、
２値部分画像取得手段が、前記輝度画像における前記領域情報で示される各領域について、前記第２決定工程で反転を行うと決定された領域に関しては、輝度値を反転した前記輝度画像内の対応する領域を、前記設定工程で当該領域に対して設定された閾値で２値化することにより２値部分画像を取得し、一方、前記第２決定工程で反転を行わないと決定された領域に関しては、輝度値を反転していない前記輝度画像内の対応する領域を、前記設定工程で当該領域に対して設定された閾値で２値化することにより２値部分画像を取得する２値部分画像取得工程と、
２値化手段が、前記輝度画像の全面を所定閾値で２値化して得られる２値画像における対応する領域に、前記２値部分画像取得工程で取得した各領域の２値部分画像を上書きすることにより、前記輝度画像に関する第２の２値画像を得る２値化工程と、
分割処理手段が、前記２値化工程で得られた第２の２値画像から文書要素を抽出し、前記生成工程で生成された領域情報で示される各領域を枠の要素とし、前記文書要素と枠の要素とに基づいて木構造を得て、当該木構造に基づいて前記輝度画像の領域分割を行う分割処理工程とを有し、
前記分割処理工程で得る木構造は、前記抽出した文書要素のうち前記領域情報で示される各領域に含まれる文書要素を当該各領域に対応する枠の要素の子孫にした木構造である。
【００１５】
【発明の実施の形態】
以下、添付の図面を参照して本発明の好適な実施形態を説明する。
【００１６】
図１は本実施形態による文書処理装置の装置概観を示す図である。図１において、１０１はコンピュータ装置であり、添付のフローチャートを参照して以下で説明する処理を実現するためのプログラムを含む、文書の電子化処理プログラムを実行する。コンピュータ装置１０１は、ユーザに状況や画像を表示するためのディスプレイ装置１０２、およびユーザの操作を受け付けるキーボードやマウス等のポインティングデバイスを含んで構成される入力装置１０３が付随する。ディスプレイ１０２表示デバイスとしては、ＣＲＴやＬＣＤ等が用いられる。１０４はスキャナ装置であり、文書画像を光学的に読み取り、電子化し、得られた画像データをコンピュータ装置１０１に送る。本実施形態では、カラースキャナを用いる。
【００１７】
図２は本実施形態による文書処理装置の構成を示すブロック図である。図１において、２０１はＣＰＵであり、ＲＯＭ２０２或いはＲＡＭ２０３に格納された制御プログラムを実行することにより、後述の電子化処理を含む各種機能を実現する。２０２はＲＯＭであり、ＣＰＵ２０１によって実行される各種制御プログラムやデータが格納される。２０３はＲＡＭであり、ＣＰＵ２０１によって実行される各種制御プログラムを格納したり、ＣＰＵ２０１が各種処理を実行するのに必要な作業領域を提供する。２０４は外部記憶装置であり、添付のフローチャートを参照して説明する処理をＣＰＵ１０１によって実現するための制御プログラムや、画像入力装置１０４を読み取って得られた文書画像データ等を格納する。２０５はコンピュータバスであり、上述の各構成を接続する。
【００１８】
図３は本実施形態の文書処理装置による文書の電子化処理の概要を示す図である。本実施形態による電子化処理の流れは、まず入力部３０１において電子化の対象とするカラー文書をスキャナ１０４を用いて読み込み、文書を画像データとして外部記憶装置２０４に格納する。次に、２値化部３０２において、後段の領域分割処理のために外部記憶装置２０４に格納された文書画像データに対して２値化処理を施す。領域分割部３０３では、２値化部３０２で得られた２値画像から、文字、図、表、枠、線などの要素を抽出し、各領域に分割する。電子化文書作成部３０４は、分割された各要素毎の属性に基づいて文字認識データや用いたり表構造データを用いたりして電子化文書を作成する。出力部３０５は生成された電子化文書を外部記憶装置２０４に格納する。なお、出力部３０５における出力の形態は外部記憶装置２０４への格納に限られるものではなく、ディスプレイ１０２へ表示出力したり、不図示のネットワークインターフェースを介してネットワーク上の他の装置へ出力したり、不図示のプリンタへ出力したりすることも可能である。
【００１９】
以下、図３に示した２値化部３０２の動作について図面参照して説明する。図４は本実施形態による２値化処理を説明するフローチャートである。また、図５は本実施形態の説明に用いるカラー文書画像の例を示す図である。なお、図５に示すカラー文書画像は、３つの下地色（下地Ａ５０１、下地Ｂ５０２、下地Ｃ５０３）を含み、下地Ａ５０１の部分が白、下地Ｂ５０２の部分が黄色、下地Ｃ５０３の部分が青であるとする。一方、文字色は、文字列Ａ５０４，文字列Ｂ５０５が共に黒、文字列Ｃ５０６は白である。
【００２０】
まず、ステップＳ４０１にて処理対象のカラー文書画像を輝度画像に変換する。ここでは一般的に元画像のカラー形式をＲＧＢ形式、輝度画像は１画素あたり０〜２５５のグレイスケール形式とし、各画素の輝度Ｙは元画像の画素値Ｒ，Ｇ，Ｂに対し、Ｙ＝０．２９９Ｒ＋０．５８７Ｇ＋０．１１４Ｂによって求めることにするが、他の形式／変換式を用いてもよいことはもちろんである。これにより図５の下地色Ａ５０１，下地色Ｂ５０２，下地色Ｃ５０３の各部分の輝度はそれぞれ２５５，２００，４０（実際は紙面状態や光電変換時のばらつきによって、同じ下地色であっても位置によって輝度値がばらつく。したがって、図６のようなヒストグラム曲線の山は、ある程度の幅を有することになる）へと変換される。
【００２１】
次に、ステップＳ４０２にて、ステップＳ４０１で得られた輝度画像データのヒストグラムをとる。図６は図５に示したカラー画像から得られる輝度値のヒストグラムを示す図である。ヒストグラムが得られたならば、ステップＳ４０３において、そのヒストグラムから複数の閾値を決定する。例えば図６に示したヒストグラムからは、ｔ１，ｔ２の２個の閾値が抽出される。
【００２２】
この複数の閾値ｔ１，ｔ２は例えば以下のような手順で決定できる。ヒストグラムの輝度値２５５（白）から０（黒）まで参照する。参照した輝度値から閾値を選択する条件は、例えば、以下の各条件を設定する。
【００２３】
条件１：当該ヒストグラム曲線において、現在の参照点と、その参照点から横軸の輝度値の大きい方向に例えば１０画素分だけさかのぼった点との間に含まれる合計度数（面積）が、所定の第１の値より大きい。
【００２４】
条件２：当該ヒストグラム曲線における縦軸の度数の急激な減少が発生（例えば「所定の第１の傾きより大」が１回、または「所定の第２の傾きより大」が連続して２回発生）。
【００２５】
条件３：当該ヒストグラム曲線の緩い減少、または上昇が発生（所定の第３の傾きより小）。
【００２６】
本実施形態において、図６に示すｔ１、ｔ２の各点は、上記の各条件のうち、「条件１」が満足された後に「条件２」が満足され、その後更に「条件３」が満足された結果として検出した。尚、参照点は、当該ヒストグラム曲線を所定の間隔で参照するようにしてもよい。また、本実施形態では以上のような条件を用いたが、閾値の決定はこれに限るものではない。例えば更に以下のような条件４〜６を同時に満たす場合を閾値として選択しても良い。
【００２７】
条件４：当該ヒストグラム曲線において、現在の参照点と、その参照点から横軸の輝度値の大きい方向に例えば４０画素さかのぼった点との間に含まれる合計度数（面積）が、所定の第２の値より大きい。
【００２８】
条件５：当該ヒストグラム曲線上の現在の参照点における縦軸の度数が、所定の第３の値より小さい。
【００２９】
条件６：当該ヒストグラム曲線において、現在の参照点と、その参照点から横軸の輝度値の大きい方向に例えば２０画素さかのぼった点におけるヒストグラム値が、所定の第４の値より大きい。
【００３０】
ステップＳ４０４にて、各閾値から複数の２値画像を作成する。これらはテンポラリの２値画像である。本例では、閾値ｔ１で２値化することにより、図７の（ａ）に示す２値画像７０１が、閾値ｔ２で２値化することにより図７の（ｂ）に示す２値画像７０２が生成される。ここで、領域７０３及び７０４はそれぞれ図５に示す下地Ｂ５０２、Ｃ５０３の領域である。閾値ｔ１では下地Ｂ５０２と下地Ｃ５０３の部分も黒となり、閾値ｔ２では、下地５０２の部分は白となり、下地Ｃ５０３の部分は黒となる。
【００３１】
次に、ステップＳ４０５にて、ステップＳ４０４で得られた各２値画像中の黒領域を抽出し、これを下地色の異なる領域を示す領域情報として記録する。これは、２値画像上の黒画素の連結成分を追跡して、一定の大きさ以上となる四角形の領域を抽出する処理である。図７の例では、２値画像７０１より領域７０３が、２値画像７０２より領域７０４がそれぞれ抽出される。
【００３２】
次に、ステップＳ４０６にて、ステップＳ４０５で抽出した各領域内の解析を行う。ここでは、ステップＳ４０１で得られた輝度画像より、ステップＳ４０５で抽出された各領域の範囲を切り出し、各範囲毎に輝度値のヒストグラムをとり、このヒストグラムから各領域について輝度値の反転が必要か否か判断し、必要な場合は輝度値を反転する。更に、このヒストグラムから各領域内に最適な２値化閾値を、図６と同様にして求める（ただし、対象とする領域の大きさが小さくなるので判断に用いる所定の値は異なる）。求められた２値化閾値及び反転情報は、各領域情報に付随するように出力される。尚、反転が必要か否かは、例えば以下に示す計算式により、ヒストグラムの平均値(average)とそのスキュー(skew)とを求めて判断できる。
【００３３】
(average)＝Σip(i),
σ²＝Σ(i-av)²p(i),
(skew)＝1/σ²Σ(i-av)³p(i),
但し、Σip(i)は、ip(i)のi＝０からi＝２５５までの場合の総和、p(i)は確率密度、avは平均値を表わす。
【００３４】
このスキューの絶対値がしきい値より大きいときには、その領域は文字を含むと判断し、更に文字を含むと判断されて且つスキューが正のときには、領域の画像を反転すべきと判断して反転フラグがyesになる。
【００３５】
本例では、領域７０３より図８に示されるヒストグラムが求められて反転必要なしと判断されて該ヒストグラムから閾値ｔ２１が求められる。さらに、領域７０４より図９に示されるヒストグラムが求められて反転必要ありと判断され、図９のヒストグラムを反転して閾値ｔ２２が求められる（尚、図９は反転前のヒストグラムを表し、閾値ｔ２２は反転前のヒストグラムの対応する値として示している）。そして、図１０に示すように、それぞれの領域毎に、座標値と２値化閾値、反転情報を１組とした領域情報が記憶される。
【００３６】
本例では、領域７０３に対する領域情報１００１は、座標値として矩形の左端のｘ座標ｌ１、右端のｘ座標ｒ１、上端のｔ１、下端のｂ１を、２値化閾値として図８のヒストグラムから求まった閾値ｔ２１を、反転情報として反転無し（ｎｏ）を有する。同様に、領域７０４に対する領域情報１００２は、座標値として矩形の左端のｘ座標ｌ２、右端のｘ座標ｒ２、上端のｔ２、下端のｂ２を、２値化閾値として図９のヒストグラムから求まった閾値ｔ２２を、反転情報として反転有り（ｙｅｓ）を有する。
【００３７】
次いで、ステップＳ４０７にて、ステップＳ４０６で得られた領域情報を基に、ステップＳ４０１で得た輝度画像から最終的な２値画像を生成する。本例によれば、
（１）全面を閾値ｔ１で２値化して２値画像７０１を得る。
（２）２値画像７０１上の領域７０３の部分の輝度画像について、領域情報１００１の内容にしたがって処理（この場合閾値（ｔ２１）で２値化）を行い、得られた画像を領域７０３の部分に上書きする。この結果、２値画像７０２が得られる。
（３）領域７０４については、対応する領域情報１００２が反転有りを示すので、領域７０４の部分の輝度画像について反転処理を行い、閾値ｔ２２を用いて２値化して得られた画像を領域７０４の部分に上書きする。この結果、図５に示したカラー画像から図１１に示すような最終２値画像が得られることになる。
【００３８】
次に、以上のようにして２値化部３０２によって得られた２値画像と領域情報を用いて、領域分割部３０３が領域分割を行う。以下、領域分割部３０３の処理について説明する。
【００３９】
図１２は本実施形態の領域分割処理を説明するフローチャートである。領域分割部３０３では、ステップＳ１２０１にて、２値化部３０２により得られた２値化された文書画像から文書要素が抽出され、それらの木構造表現が作られる。このステップＳ１２０１の処理の詳細について図１３のフローチャートを用いて説明する。図１３は図１２のステップＳ１２０１の要素抽出及び木構造化の処理を説明するフローチャートである。
【００４０】
ステップＳ１３０１では、２値化画像からすべての黒画素塊を抽出する。なお、黒画素塊とは、上述したように黒画素の８連結輪郭塊であり、図１４に示すように、縦横斜めに接触した画素で作られた輪郭を持つ黒画素の集合のことである。続くステップＳ１３０２では、抽出した黒画素塊が、予め予想される最大文字高さおよび幅（予め実験的に求めた値）に対し定められた閾値以下の大きさを有するかどうかを判定し、閾値以下である場合はステップＳ１３０８に進み、当該黒画素塊を文字要素と判定する。これを“CHAR”と呼ぶ。
【００４１】
ステップＳ１３０３では、抽出した黒画素が一定比率以上で縦長あるいは横長であるかどうかを判断する。当該黒画素塊が一定比率以上で縦長あるいは横長であった場合は、ステップＳ１３０９において“ＬＩＮＥ”と判定する。また、ステップＳ１３０４では、抽出した黒画素塊中の黒画素のなす輪郭に注目し、その形状が細い斜めの線状であった場合は、ステップＳ１３０９へ進み、当該黒画素塊を“ＬＩＮＥ”と判定する。
【００４２】
ステップＳ１３０５では、黒画素塊の輪郭形状が四角形かどうかを調べる。図１５は、（ａ）黒画素塊の輪郭が四角形の場合と、（ｂ）黒画素塊の輪郭が非四角形である場合の例を示す図である。ステップＳ１３０５において、黒画素塊の輪郭形状が四角形でなければ、ステップＳ１３１２へ進み、当該黒画素塊を“PICTURE”と判定する。
【００４３】
一方、黒画素塊が四角形ならばステップＳ１３０６へ進む。ステップＳ１３０６では、黒画素塊の内部に存在する白画素の４連結輪郭塊を抽出する。白画素の４連結輪郭塊とは、図１６のように、縦横のみに接触した画素で作られた輪郭を持つ白画素の集合のことである。以降この集合を白画素塊と呼ぶ。
【００４４】
ステップＳ１３０７では、ステップＳ１３０６で黒画素塊から抽出された白画素塊の形状がすべて四角形であり、かつ黒画素塊内を所定の間隔で隙間なく埋めているかどうかを判定する。この判定の結果がＹＥＳであった場合は、ステップＳ１３１１へ進み、当該黒画素塊を“FRAME”と判定する。図１７は、枠（FRAME）と図（PICTURE）における内部白画素塊の配置例を示す図である。ステップＳ１３０７の判定によれば、（ａ）、（ｂ）はステップＳ１３０７の判定条件を満たすので、ステップＳ１３１１で枠（FRAME）であると判定されることになる。
【００４５】
また、図１７の（ｃ）に示した白画素塊の配列は、「黒画素塊から抽出された白画素塊の形状がすべて四角形であり、かつ黒画素塊内を隙間なく埋めている」という条件を満たしておらず、ステップＳ１３１２において図（PICTURE）と判定されることになる。結局、ステップＳ１３１２では、ステップＳ１３０７までの条件のいずれにもあてはまらない黒画素塊を、“PICTURE”としている。
【００４６】
さて、本実施形態では、各要素の親となる要素“GROUND”を導入する。画面全体をひとつの“GROUND”とすると、これまで画像から抽出された各要素は、すべてその子供の要素として表現される。そして、“FRAME”と判定された黒画素塊の内部から抽出された白画素塊のひとつひとつをそれぞれ“GROUND”とし、更にこの白画素の内部で、上述したステップＳ１３０１〜Ｓ１３１２の処理を行って子供となる要素を抽出する（ステップＳ１３１３）。なお、“FRAME”の内部で更に“FRAME”が抽出されたときには、これをGROUNDとしてさらに再帰的に処理を行う。
【００４７】
すべての再帰的内部探索が終了した時点で、画像から抽出された要素は木構造を構成することになる。図１８は、文書画像の例と、これをステップＳ１２０１の要素抽出、木構造化によって処理して得られる木構造の例を示す図である。図１８の（ａ）に示されるように、文書画像１８０１は、テキスト列（CHAR）１８０２、１８０７、１８０８と、フレーム（FRAME）１８０４と、図（PICTURE）１８０３、１８０９とを有する。
【００４８】
この文書画像１８０１を上述の処理によって木構造化すると、図１８の（ｂ）のようになる。GROUND１８２１は文書画像１８０１の全体を示し、その要素の一つであるFRAME１８２４は文書画像１８０１中のフレーム１８０４に対応する。更にフレーム１８０４は２つのフレームに分けられ、木構造上では、それぞれGROUND１８２５、１８２６として示されている。
【００４９】
以上のようにして、図１２のステップＳ１２０１で文書画像要素の木構造を得ると、ステップＳ１２０２において、ステップＳ４０６で取得した領域情報を木構造に当てはめる。即ち、２値化部３０２による２値化処理の際に記録された領域情報を参照して、ステップＳ３０１で得られた木構造を変更する。このステップＳ１２０２の処理について、図１９のフローチャートを用いて説明する。
【００５０】
ステップＳ１９０１では、領域情報が存在するかどうかを調べる。領域情報がなければ、そのまま本処理を終了する。領域情報が存在する場合は、ステップＳ１９０２へ進み、領域を仮想的に“FRAME”および“GROUND”の組と考えて要素の木構造の適当な箇所に挿入する。すなわち、領域を内包する“GROUND”を親とする位置に、その領域情報が表す矩形に対応する新たな“FRAME”を挿入し、その子として新たな“GROUND”を置く。
【００５１】
ステップＳ１９０３では、ステップＳ１９０２で挿入した“FRAME”と同じ親（GROUND）を持つ兄弟のうち、当該領域内部に位置する要素をすべて自分の子孫、すなわち新たな“GROUND”の子供へと移動する。
【００５２】
図２０は、上述した木構造変更処理の例を説明する図である。文書画像２００１は色付きの下地領域２００２を有する。２値化部３０２はこの文書画像２００１を２値化して２値画像２０１０を生成するとともに、領域２００２に対応する領域情報２０２０を生成する（図４）。領域分割部３０３は、２値画像２０１０について図１３のフローチャートで説明した処理を施すことにより、画像全体をGROUND２０３１とする木構造２０３０を生成する（Ｓ１２０１）。そして、木構造２０３０に、領域情報２０２０を当てはめて、木構造を変更し、木構造２０４０を得る。
【００５３】
より具体的には、GROUND２０３１を親として、領域２００２に相当する枠（FRAME）を挿入して、その子としてGROUND２０４１を置く。そして、領域２００２に含まれる各要素（Ａ，Ｂ，Ｃ，図形）をGROUND２０４１の子として配置することにより、木構造を変更する。
【００５４】
以上のようにして、ステップＳ１２０２の処理を終えたならば、ステップＳ１２０３へ進む。ステップＳ１２０３では、文字要素をグループ化して行および文字領域を作成する。ステップＳ１２０３の処理について、図２１のフローチャートを用いて説明する。図２１はステップＳ１２０３における文字領域の作成処理を説明するフローチャートである。
【００５５】
ステップＳ２１０１では、各“CHAR”に対し、それぞれ隣り合う“CHAR”との水平距離が閾値内にあるもの同志をグループ化する。このグループを“TEXTLINE”と呼ぶ。但し、これらのグループ化は、同じ親を持つ“CHAR”間のみで行われる。
【００５６】
次に、ステップＳ２１０２において、更にこの“TEXTLINE”同志で、それぞれ隣り合う垂直距離が閾値内にあるもの同志をグループ化する。こうして得られた“TEXTLINE”のグループを“TEXT”あるいは文字領域と呼ぶ。但し、これらのグループ化は同じ親をもつ“CHAR”からなる“TEXTLINE”の間のみで行う。
【００５７】
以上の処理で、文書画像は文字領域である“TEXT”、線の画像部分である“ＬLINE”、図や写真の領域である“PICTURE”、表や枠の領域である“FRAME”という各要素に分割される。
【００５８】
なお、ステップＳ２１０３における文字領域作成時のグループ化は、木構造を意識して行われるので、例えば図２２のように、２値画像上の実線の枠＝“FRAME１”がある場合、“FRAME1”内外それぞれの文字は必ず異なるグループとなる。すなわち、“TEXT1”と“TEXT3”、“TEXT2”と“TEXT4”のように異なる文字領域にグループ化される。
【００５９】
同様に、図２３におけるようなカラー画像の処理の場合、２値化処理後の画像上では枠となるような情報がなくても、ステップＳ１２０２における処理により、２値化処理時に得られた領域情報が領域分割に反映され、色下地上の文字は他と異なる文字領域となり、図２２と同様な正しい領域分割結果が得られる。即ち、２値化処理部３０２によって、文書画像２３０１の２値画像２３１０が得られるとともに領域情報２３２０が得られる。２値画像２３１０を領域分割する際に、ステップＳ１２０２の処理により、領域情報２３２０が反映されて、２３３０に示す如く領域分割結果が得られる。
【００６０】
もし、ステップＳ１２０２の処理を行わないで領域分割処理を行った場合は、下地の情報が反映されることがないので、図２４のように誤った文字領域が得られることになる。
【００６１】
以上説明したように、本実施形態によれば、カラー画像を２値化して領域分割処理を行う際に、２値化処理時に記憶された色付下地の領域情報を用いて領域分割処理の解析内容を変更するので、カラー画像の分割処理に際して２値化の際に失われてしまった情報をも正しく反映した文字領域の抽出が可能になり、より高精度な領域分割処理が可能になる。
【００６２】
なお、上記実施形態によれば、２値化部３０２が保存する領域情報、および領域分割部３０３が処理する“FRAME”領域は四角形に限定していたが、これは矩形の任意連接体、あるいは円、楕円などの領域を対象とするようにしてもよい。この場合でも、カラー画像にあって、２値化の際に失われてしまった情報をも正しく反映した文字領域の抽出が可能になり、より高精度な領域分割処理が可能になる。
【００６３】
なお、本発明は、複数の機器（例えばホストコンピュータ、インタフェイス機器、リーダ、プリンタなど）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機、ファクシミリ装置など）に適用してもよい。
【００６４】
また、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体（または記録媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００６５】
さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００６６】
【発明の効果】
以上説明したように、本発明によれば、色によって表された領域の区別を維持した領域分割が可能となる。
【図面の簡単な説明】
【図１】本実施形態による文書処理装置の装置概観を示す図である。
【図２】本実施形態による文書処理装置の構成を示すブロック図である。
【図３】本実施形態の文書処理装置による文書の電子化処理の概要を示す図である。
【図４】本実施形態による２値化処理を説明するフローチャートである。
【図５】本実施形態の説明に用いるカラー文書画像の例を示す図である。
【図６】図５に示したカラー画像から得られる輝度値のヒストグラムを示す図である。
【図７】（ａ）は図５に示したカラー画像を図６に示す閾値ｔ１で２値化した画像を示し、（ｂ）は閾値ｔ２で２値化した画像を示す図である。
【図８】図７の領域７０３における輝度値のヒストグラムを示す図である。
【図９】図７の領域７０４における輝度値のヒストグラムを示す図である。
【図１０】領域情報を示す図である。
【図１１】図５に示したカラー画像の最終的な２値画像を示す図である。
【図１２】本実施形態の領域分割処理を説明するフローチャートである。
【図１３】図１２のステップＳ１２０１の要素抽出及び木構造化の処理を説明するフローチャートである。
【図１４】黒画素の８連結輪郭塊の例を示す図である。
【図１５】（ａ）黒画素塊の輪郭が四角形の場合と、（ｂ）黒画素塊の輪郭が非四角形である場合の例を示す図である。
【図１６】白画素の４連結輪郭塊の例を示す図である。
【図１７】枠（FRAME）と図（PICTURE）における内部白画素塊の配置例を示す図である。
【図１８】文書画像の例と、これをステップＳ１２０１の要素抽出、木構造化によって処理して得られる木構造の例を示す図である。
【図１９】図１２のステップＳ１２０２による木構造の変更処理を説明するフローチャートである。
【図２０】上述した木構造変更処理の例を説明する図である。
【図２１】ステップＳ１２０３における文字領域の作成処理を説明するフローチャートである。
【図２２】枠領域を持つ文書の領域分割処理例を示す図である。
【図２３】本実施形態により色付き下地領域を持つカラー文書が正しく領域分割される様子を示す図である。
【図２４】一般的な手法により色付きの下地領域を持つカラー文書に領域分割を行った様子を説明する図である。
【図２５】領域分割の一例を示す図である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a document processing apparatus and method for performing digitization processing of a document, and more particularly to an image area separation processing executed in digitization processing of a document.
[0002]
[Prior art]
In recent years, with the progress of computerization of information, there is an increasing demand for storing or transmitting documents in electronic form instead of paper. In particular, due to the low price of storage media and the increase in communication bandwidth, documents to be digitized are expanding from black and white binary to full color documents.
[0003]
Document digitization here refers not only to photoelectric conversion of a paper document by a scanner or the like to convert it into image data, but also to the text, symbols, and figures that make up the document by recognizing the contents described. It is divided into areas with different properties such as photos, tables, etc., and the character part is character code information, the figure is vector data, the picture is image data, the table is structure data, etc. Refers to that.
[0004]
The first stage of electronic processing of such a document is to analyze the content written in a one-page document image and divide it into subelements of different properties such as characters, diagrams, photographs, tables, That is, it is a region division process. FIG. 25 is a diagram illustrating an example of area division.
[0005]
As an implementation example of such region division processing, “Method and Apparatus for character recognition” (Shin-Ywang et al./Canon K. K) of US Pat. No. 5,680,478 can be cited. In this example, a set of eight connected contour blocks of black pixels and four connected contour blocks of white pixels in a document image is extracted, and a character region, a picture or figure, a table, a frame is extracted from its shape, size, set state, and the like. , A characteristic area of a document such as a line is extracted. In the example of FIG. 25, character characteristic areas (blocks 1, 3, 4, 6), pictures and graphic areas (block 2), table areas (block 5), frames, and lines (7) are extracted. is doing.
[0006]
Here, the 8-connected outline block of black pixels (hereinafter referred to as black pixel block) is a collection of black pixels connected in any of eight directions from a certain black pixel as shown in FIG. Further, the 4-connected outline block of white pixels (hereinafter, white pixel block) refers to an aggregate of white pixels connected in any of four directions from a white pixel as shown in FIG.
[0007]
The above-described region division processing is premised on that the input document image is a monochrome binary because of its operation principle. Therefore, in order to perform color document region division using this technique, it is necessary to binarize the document image in advance. In general, binarization of a color image is performed by obtaining a threshold value from a luminance distribution of pixels and converting each pixel of the image into a white or black pixel with the above-described luminance threshold as a boundary.
[0008]
[Problems to be solved by the invention]
As a method for obtaining a threshold value for binarization of a color image, there are a method for obtaining a single screen and a method for obtaining each region. In the binarization method proposed in Japanese Patent Application No. 11-238581 by the present applicant, an optimum threshold value is dynamically obtained for each region according to the contents of the input document, and this is used for each region. The binarization optimal for is performed. In particular, it is possible to binarize a color image in which high-intensity characters on a low-luminance background and low-intensity characters on a high-luminance background are mixed, so that all of them are automatically converted to black characters on a white background. It is possible to obtain an optimal binary image as an input of the division process.
[0009]
FIG. 24 is a diagram for explaining a state in which region separation is performed on a document including a colored background by the previously proposed binarization method. In FIG. 24, a color document 2301 has a dark colored background area in the lower half, and light color characters are placed on it, and other than that, dark color characters are displayed on a light color background. It shall be. It can be seen that in such a document, the upper half and the lower half will be semantically separated.
[0010]
When binarization is performed on a color document such as the document 2301 by the above-described binarization method, a binary image as indicated by 2302 in FIG. 24 is generated. In the binary image 2302, the background is removed to make all white pixels, and the characters are all black pixels. At this time, if region segmentation processing is performed on the binary image 2302 as usual, the result shown in 2303 in FIG. 24 is obtained, because information on the underlying region existing in the lower half of the screen is missing. Originally, TEXT1 and TEXT2 are bonded to each other even though they should be separated into two at the center.
[0011]
That is, the character area range designation information based on the background color originally possessed by the color image is lost when binarization is performed.
[0012]
The present invention has been made in view of the above-described problems, and an object thereof is to enable region division while maintaining the distinction between regions represented by colors.
[0013]
[Means for Solving the Problems]
  In order to achieve the above object, a document processing apparatus according to an aspect of the present invention is provided.
  LuminanceDetermine multiple thresholds for binarizing an imageFirstA determination means;
  SaidFirstUsing each of the plurality of threshold values determined by the determining meansLuminanceObtaining means for obtaining a plurality of binary images by binarizing the images;
  Included in each of a plurality of binary images acquired by the acquisition meansBlack pixelConnected componentAreaOn the basis of the,Each with a different background brightnessArea location and sizeWhenTheShowGenerating means for generating area information;
  Second determination means for determining whether or not to invert the luminance value for each region indicated by the region information in the luminance image;
  For each area indicated by the area information in the luminance image, setting means for setting a threshold for binarizing each area;
  For each region indicated by the region information in the luminance image, with respect to the region determined to be inverted by the second determining unit, a corresponding region in the luminance image whose luminance value has been inverted is determined by the setting unit. A binary partial image is obtained by binarizing with a threshold value set for the area, while the luminance value is inverted for the area determined not to be inverted by the second determining means. A binary partial image acquisition unit that acquires a binary partial image by binarizing a corresponding region in the non-luminance image with a threshold value set for the region by the setting unit;
  By overwriting the binary partial image of each area acquired by the binary partial image acquisition means on the corresponding area in the binary image obtained by binarizing the entire surface of the luminance image with a predetermined threshold, the luminance image ConcerningBinarization means for obtaining a second binary image;
  Second binary image obtained by the binarization meansExtract document elements fromRegion information generated by the generating meansEach area indicated by is used as a frame element, a tree structure is obtained based on the document element and the frame element, and the luminance image is obtained based on the tree structure.Division processing means for performing area division;With
  The tree structure obtained by the division processing means is a tree structure in which document elements included in each area indicated by the area information among the extracted document elements are descendants of frame elements corresponding to the areas..
[0014]
  Also, a document processing method of the present invention for achieving the above objectIs
  FirstThe decision meansLuminanceDetermine multiple thresholds for binarizing an imageFirstA decision process;
  Acquisition meansFirstUsing each of the plurality of threshold values determined in the determination stepLuminanceAn acquisition step of acquiring a plurality of binary images by binarizing the images;
  The generation means is included in each of the plurality of binary images acquired in the acquisition step.Black pixelConnected componentAreaOn the basis of the,Each with a different background brightnessArea location and sizeWhenTheShowA generation step of generating region information;
  A second determination step in which a second determination unit determines whether to invert the luminance value for each region indicated by the region information in the luminance image;
  A setting step in which a setting unit sets a threshold for binarizing each region for each region indicated by the region information in the luminance image;
  The binary partial image acquisition means, for each region indicated by the region information in the luminance image, for the region determined to be inverted in the second determination step, the correspondence in the luminance image with the luminance value inverted. A binary partial image is obtained by binarizing the area to be processed with the threshold value set for the area in the setting step, and on the other hand, the region determined not to be inverted in the second determination step Is a binary partial image that obtains a binary partial image by binarizing a corresponding region in the luminance image that has not been inverted in luminance value with the threshold value set for the region in the setting step. Acquisition process;
  Binarization meansBy overwriting the binary partial image of each region acquired in the binary partial image acquisition step on the corresponding region in the binary image obtained by binarizing the entire surface of the luminance image with a predetermined threshold, the luminance image ConcerningA binarization step for obtaining a second binary image;
  A second binary image obtained in the binarization step by the division processing means;Extract document elements fromRegion information generated in the generating stepEach area indicated by is used as a frame element, a tree structure is obtained based on the document element and the frame element, and the luminance image is obtained based on the tree structure.A division processing step for performing area division;Have
The tree structure obtained in the division processing step is a tree structure in which document elements included in each area indicated by the area information among the extracted document elements are descendants of frame elements corresponding to the areas..
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.
[0016]
FIG. 1 is a diagram showing an overview of a document processing apparatus according to the present embodiment. In FIG. 1, reference numeral 101 denotes a computer apparatus that executes a document digitization processing program including a program for realizing processing described below with reference to the attached flowchart. The computer device 101 is accompanied by a display device 102 for displaying a situation and an image to a user, and an input device 103 configured to include a pointing device such as a keyboard and a mouse for receiving a user operation. As the display device for the display 102, a CRT, an LCD, or the like is used. A scanner device 104 optically reads a document image, digitizes it, and sends the obtained image data to the computer device 101. In this embodiment, a color scanner is used.
[0017]
FIG. 2 is a block diagram showing the configuration of the document processing apparatus according to this embodiment. In FIG. 1, reference numeral 201 denotes a CPU, which realizes various functions including electronic processing described later by executing a control program stored in a ROM 202 or a RAM 203. A ROM 202 stores various control programs executed by the CPU 201 and data. A RAM 203 stores various control programs executed by the CPU 201 and provides a work area necessary for the CPU 201 to execute various processes. Reference numeral 204 denotes an external storage device which stores a control program for realizing the processing described with reference to the attached flowchart by the CPU 101, document image data obtained by reading the image input device 104, and the like. A computer bus 205 connects the above-described components.
[0018]
FIG. 3 is a diagram showing an outline of document digitization processing by the document processing apparatus of this embodiment. In the flow of digitization processing according to the present embodiment, first, a color document to be digitized is read by the input unit 301 using the scanner 104 and stored in the external storage device 204 as image data. Next, the binarization unit 302 performs binarization processing on the document image data stored in the external storage device 204 for the subsequent area division processing. The region dividing unit 303 extracts elements such as characters, diagrams, tables, frames, lines, and the like from the binary image obtained by the binarizing unit 302 and divides them into regions. The digitized document creation unit 304 creates a digitized document by using character recognition data or using table structure data based on the attribute of each divided element. The output unit 305 stores the generated digitized document in the external storage device 204. Note that the output form in the output unit 305 is not limited to storage in the external storage device 204, and is displayed on the display 102 or output to other devices on the network via a network interface (not shown). It is also possible to output to a printer (not shown).
[0019]
Hereinafter, the operation of the binarization unit 302 shown in FIG. 3 will be described with reference to the drawings. FIG. 4 is a flowchart for explaining binarization processing according to this embodiment. FIG. 5 is a diagram showing an example of a color document image used for explaining the present embodiment. The color document image shown in FIG. 5 includes three background colors (background A501, background B502, background C503), the background A501 portion is white, the background B502 portion is yellow, and the background C503 portion is blue. And On the other hand, as for the character color, character string A504 and character string B505 are both black, and character string C506 is white.
[0020]
First, in step S401, a color document image to be processed is converted into a luminance image. Here, in general, the color format of the original image is RGB format, the luminance image is a gray scale format of 0 to 255 per pixel, and the luminance Y of each pixel is Y = the pixel values R, G, and B of the original image. Although it is determined by 0.299R + 0.587G + 0.114B, it goes without saying that other forms / conversion formulas may be used. As a result, the luminance of each part of the background color A501, background color B502, and background color C503 in FIG. Therefore, the peak of the histogram curve as shown in FIG. 6 has a certain width.
[0021]
Next, in step S402, a histogram of the luminance image data obtained in step S401 is taken. FIG. 6 is a diagram showing a histogram of luminance values obtained from the color image shown in FIG. If a histogram is obtained, a plurality of threshold values are determined from the histogram in step S403. For example, two threshold values t1 and t2 are extracted from the histogram shown in FIG.
[0022]
The plurality of threshold values t1 and t2 can be determined by the following procedure, for example. Reference is made from the luminance value 255 (white) to 0 (black) of the histogram. For example, the following conditions are set as conditions for selecting a threshold value from the referenced luminance value.
[0023]
Condition 1: In the histogram curve, a total frequency (area) included between the current reference point and a point that goes back by, for example, 10 pixels in the direction in which the luminance value on the horizontal axis increases from the reference point is a predetermined value. Greater than the first value.
[0024]
Condition 2: A rapid decrease in the frequency of the vertical axis in the histogram curve occurs (for example, “greater than a predetermined first slope” once or “greater than a predetermined second slope” twice in succession. Occurrence).
[0025]
Condition 3: A gentle decrease or increase in the histogram curve occurs (smaller than a predetermined third slope).
[0026]
In this embodiment, each of the points t1 and t2 shown in FIG. 6 satisfies “condition 2” after satisfying “condition 1” among the above conditions, and further satisfies “condition 3” thereafter. Detected as a result. The reference point may refer to the histogram curve at a predetermined interval. Moreover, although the above conditions are used in this embodiment, the determination of a threshold value is not restricted to this. For example, the case where the following conditions 4 to 6 are simultaneously satisfied may be selected as the threshold value.
[0027]
Condition 4: In the histogram curve, the total frequency (area) included between the current reference point and a point that goes back, for example, 40 pixels from the reference point in the direction in which the luminance value on the horizontal axis increases is a predetermined second value. Greater than the value of.
[0028]
Condition 5: The frequency of the vertical axis at the current reference point on the histogram curve is smaller than a predetermined third value.
[0029]
Condition 6: In the histogram curve, the histogram value at the current reference point and a point that goes back, for example, 20 pixels from the reference point in the direction of increasing luminance value on the horizontal axis is larger than a predetermined fourth value.
[0030]
In step S404, a plurality of binary images are created from each threshold value. These are temporary binary images. In this example, the binary image 701 shown in FIG. 7A is obtained by binarizing with the threshold value t1, and the binary image 702 shown in FIG. 7B is obtained by binarizing with the threshold value t2. Generated. Here, regions 703 and 704 are regions of bases B502 and C503 shown in FIG. At the threshold value t1, the background B502 and the background C503 are also black, and at the threshold t2, the background 502 is white and the background C503 is black.
[0031]
Next, in step S405, the black area in each binary image obtained in step S404 is extracted, and this is recorded as area information indicating areas having different background colors. This is a process of tracking a connected component of black pixels on a binary image and extracting a rectangular region having a certain size or more. In the example of FIG. 7, the region 703 is extracted from the binary image 701, and the region 704 is extracted from the binary image 702.
[0032]
Next, in step S406, the analysis in each area extracted in step S405 is performed. Here, the range of each region extracted in step S405 is extracted from the luminance image obtained in step S401, a histogram of luminance values is taken for each range, and is it necessary to invert the luminance value for each region from this histogram? Judgment is made and if necessary, the luminance value is inverted. Further, an optimum binarization threshold value in each region is obtained from this histogram in the same manner as in FIG. 6 (however, the predetermined value used for the judgment is different because the size of the target region is small). The obtained binarization threshold value and inversion information are output so as to accompany each area information. Whether or not inversion is necessary can be determined by obtaining an average value (average) of the histogram and its skew (skew), for example, by a calculation formula shown below.
[0033]
(average) = Σip (i),
σ²= Σ (i-av)²p (i),
(skew) = 1 / σ²Σ (i-av)^Threep (i),
However, Σip (i) represents the sum of ip (i) from i = 0 to i = 255, p (i) represents the probability density, and av represents the average value.
[0034]
When the absolute value of the skew is larger than the threshold value, it is determined that the area includes characters, and when it is determined that the area further includes characters and the skew is positive, it is determined that the image of the area should be inverted and inverted. The flag becomes yes.
[0035]
In this example, the histogram shown in FIG. 8 is obtained from the region 703, it is determined that no inversion is necessary, and the threshold value t21 is obtained from the histogram. Furthermore, the histogram shown in FIG. 9 is obtained from the region 704 and it is determined that it is necessary to invert, and the threshold t22 is obtained by inverting the histogram of FIG. 9 (FIG. 9 represents the histogram before inversion and the threshold t22. Is shown as the corresponding value in the histogram before inversion). Then, as shown in FIG. 10, region information in which a coordinate value, a binarization threshold value, and inversion information are set as one set is stored for each region.
[0036]
In this example, the area information 1001 for the area 703 is obtained from the histogram of FIG. 8 by using, as coordinate values, the left end x-coordinate l1, the right end x-coordinate r1, the upper end t1, and the lower end b1 as binarization threshold values. The threshold t21 has “no inversion” (no) as the inversion information. Similarly, the area information 1002 for the area 704 includes a threshold value obtained from the histogram of FIG. 9 using the x coordinate l2 at the left end of the rectangle, the x coordinate r2 at the right end, the t2 at the upper end, and the b2 at the lower end as the binarization threshold values. Inversion information “yes” is included as inversion information t22.
[0037]
Next, in step S407, a final binary image is generated from the luminance image obtained in step S401 based on the region information obtained in step S406. According to this example,
(1) A binary image 701 is obtained by binarizing the entire surface with a threshold value t1.
(2) The luminance image of the area 703 on the binary image 701 is processed according to the contents of the area information 1001 (in this case, binarized with the threshold value (t21)), and the obtained image is converted into the area 703 Overwrite to. As a result, a binary image 702 is obtained.
(3) For the region 704, since the corresponding region information 1002 indicates that there is inversion, the inversion processing is performed on the luminance image of the portion of the region 704, and the image obtained by binarization using the threshold value t22 is displayed in the region 704 Overwrite the part. As a result, a final binary image as shown in FIG. 11 is obtained from the color image shown in FIG.
[0038]
Next, the region dividing unit 303 performs region division using the binary image and region information obtained by the binarizing unit 302 as described above. Hereinafter, the process of the area dividing unit 303 will be described.
[0039]
FIG. 12 is a flowchart for explaining the region division processing of this embodiment. In step S1201, the region dividing unit 303 extracts document elements from the binarized document image obtained by the binarizing unit 302, and creates a tree structure representation thereof. Details of the processing in step S1201 will be described with reference to the flowchart of FIG. FIG. 13 is a flowchart for explaining the element extraction and tree structuring processing in step S1201 of FIG.
[0040]
In step S1301, all black pixel blocks are extracted from the binarized image. Note that the black pixel block is an 8-connected outline block of black pixels as described above, and is a set of black pixels having an outline formed by pixels that are in contact with each other vertically and horizontally as shown in FIG. . In the subsequent step S1302, it is determined whether or not the extracted black pixel block has a size equal to or smaller than a predetermined threshold with respect to a maximum character height and width (values obtained experimentally in advance) expected in advance. If it is below, the process advances to step S1308 to determine that the black pixel block is a character element. This is called “CHAR”.
[0041]
In step S1303, it is determined whether the extracted black pixels are vertically long or horizontally long with a certain ratio or more. If the black pixel block is vertically long or horizontally long with a certain ratio or more, it is determined as “LINE” in step S1309. In step S1304, attention is paid to the outline formed by the black pixels in the extracted black pixel block, and if the shape is a thin diagonal line, the process proceeds to step S1309, and the black pixel block is referred to as “LINE”. judge.
[0042]
In step S1305, it is checked whether the outline shape of the black pixel block is a quadrangle. FIG. 15 is a diagram illustrating an example in which (a) the outline of a black pixel block is a rectangle, and (b) the outline of a black pixel block is a non-rectangle. If it is determined in step S1305 that the outline shape of the black pixel block is not a rectangle, the process proceeds to step S1312, and the black pixel block is determined to be “PICTURE”.
[0043]
On the other hand, if the black pixel block is a square, the process proceeds to step S1306. In step S1306, four connected outline chunks of white pixels existing inside the black pixel chunk are extracted. The 4-connected outline block of white pixels is a set of white pixels having an outline formed by pixels that are in contact only in the vertical and horizontal directions as shown in FIG. Hereinafter, this set is called a white pixel block.
[0044]
In step S1307, it is determined whether or not the shape of the white pixel block extracted from the black pixel block in step S1306 is all quadrangular, and the black pixel block is filled without a gap at a predetermined interval. If the result of this determination is YES, the process advances to step S1311 to determine that the black pixel block is “FRAME”. FIG. 17 is a diagram illustrating an arrangement example of internal white pixel blocks in a frame (FRAME) and a diagram (PICTURE). According to the determination in step S1307, since (a) and (b) satisfy the determination condition in step S1307, it is determined in step S1311 that the frame is a frame (FRAME).
[0045]
In addition, the arrangement of white pixel blocks shown in FIG. 17C is that “the shape of the white pixel blocks extracted from the black pixel blocks are all square and fills the black pixel blocks without any gaps”. The condition is not satisfied, and it is determined as a diagram (PICTURE) in step S1312. Eventually, in step S1312, a black pixel block that does not meet any of the conditions up to step S1307 is set as “PICTURE”.
[0046]
In the present embodiment, an element “GROUND” that is a parent of each element is introduced. Assuming that the entire screen is a single “GROUND”, all the elements extracted so far from the image are expressed as the elements of the child. Then, each of the white pixel blocks extracted from the inside of the black pixel block determined as “FRAME” is set to “GROUND”, and further, the above-described steps S1301 to S1312 are performed inside the white pixel. Are extracted (step S1313). When “FRAME” is further extracted inside “FRAME”, this is further recursively processed as GROUND.
[0047]
At the time when all the recursive internal searches are completed, the elements extracted from the image form a tree structure. FIG. 18 is a diagram showing an example of a document image and an example of a tree structure obtained by processing this by element extraction and tree structuring in step S1201. As shown in FIG. 18A, the document image 1801 has text strings (CHAR) 1802, 1807, 1808, a frame (FRAME) 1804, and diagrams (PICTURE) 1803, 1809.
[0048]
When this document image 1801 is made into a tree structure by the above-described processing, it is as shown in FIG. GROUND 1821 indicates the entire document image 1801, and one of its elements, FRAME 1824, corresponds to a frame 1804 in the document image 1801. The frame 1804 is further divided into two frames, which are indicated as GROUND 1825 and 1826 on the tree structure, respectively.
[0049]
As described above, when the tree structure of the document image element is obtained in step S1201 of FIG. 12, the area information acquired in step S406 is applied to the tree structure in step S1202. That is, the tree structure obtained in step S301 is changed with reference to the area information recorded in the binarization process by the binarization unit 302. The processing in step S1202 will be described using the flowchart in FIG.
[0050]
In step S1901, it is checked whether area information exists. If there is no area information, this process is terminated as it is. If the area information exists, the process proceeds to step S1902, and the area is virtually considered as a set of “FRAME” and “GROUND” and is inserted into an appropriate portion of the element tree structure. In other words, a new “FRAME” corresponding to the rectangle represented by the region information is inserted at a position having “GROUND” as a parent, which contains the region, and a new “GROUND” is placed as a child of that.
[0051]
In step S 1903, among the siblings having the same parent (GROUND) as “FRAME” inserted in step S 1902, all elements located inside the area are moved to their descendants, that is, new “GROUND” children.
[0052]
FIG. 20 is a diagram illustrating an example of the tree structure changing process described above. The document image 2001 has a colored background area 2002. The binarization unit 302 binarizes the document image 2001 to generate a binary image 2010 and also generates area information 2020 corresponding to the area 2002 (FIG. 4). The region dividing unit 303 generates the tree structure 2030 having the entire image as GROUND 2031 by performing the processing described in the flowchart of FIG. 13 on the binary image 2010 (S1201). Then, the area information 2020 is applied to the tree structure 2030 to change the tree structure to obtain the tree structure 2040.
[0053]
More specifically, a frame (FRAME) corresponding to the region 2002 is inserted with GROUND 2031 as a parent, and GROUND 2041 is placed as a child thereof. Then, by arranging each element (A, B, C, figure) included in the region 2002 as a child of the GROUND 2041, the tree structure is changed.
[0054]
When the process of step S1202 is completed as described above, the process proceeds to step S1203. In step S1203, character elements are grouped to create lines and character areas. The process of step S1203 will be described using the flowchart of FIG. FIG. 21 is a flowchart for explaining the character region creation processing in step S1203.
[0055]
In step S2101, each “CHAR” is grouped together with each other having a horizontal distance within the threshold with the adjacent “CHAR”. This group is called “TEXTLINE”. However, these groupings are performed only between “CHAR” s having the same parent.
[0056]
Next, in step S2102, the “TEXTLINE” comrades that are adjacent to each other and whose vertical distance is within the threshold are grouped. The group of “TEXTLINE” obtained in this way is called “TEXT” or character area. However, these groupings are performed only between “TEXTLINE” consisting of “CHAR” having the same parent.
[0057]
With the above processing, each element of the document image is “TEXT” which is a character area, “LLINE” which is a line image part, “PICTURE” which is a picture or picture area, and “FRAME” which is a table or frame area. It is divided into.
[0058]
The grouping at the time of creating the character area in step S2103 is performed in consideration of the tree structure. For example, as shown in FIG. 22, if there is a solid line frame = “FRAME1” on the binary image, “FRAME1” Each character inside and outside must be in a different group. That is, they are grouped into different character areas such as “TEXT1” and “TEXT3”, “TEXT2” and “TEXT4”.
[0059]
Similarly, in the case of color image processing as shown in FIG. 23, even if there is no information that forms a frame on the image after binarization processing, the region obtained during binarization processing by the processing in step S1202 The information is reflected in the area division, and the characters on the color background become different character areas, and a correct area division result similar to FIG. 22 is obtained. That is, the binarization processing unit 302 obtains a binary image 2310 of the document image 2301 and area information 2320. When the binary image 2310 is divided into regions, the region information 2320 is reflected by the processing in step S1202, and a region division result is obtained as indicated by 2330.
[0060]
If the area dividing process is performed without performing the process of step S1202, the background information is not reflected, and an erroneous character area is obtained as shown in FIG.
[0061]
As described above, according to the present embodiment, when the color image is binarized and the area division processing is performed, the analysis of the area division processing is performed using the area information of the colored background stored at the time of the binarization processing. Since the contents are changed, it is possible to extract a character area that correctly reflects information lost in binarization in color image division processing, and more accurate region division processing is possible.
[0062]
According to the above embodiment, the area information stored by the binarizing unit 302 and the “FRAME” area processed by the area dividing unit 303 are limited to squares. An area such as a circle or an ellipse may be targeted. Even in this case, it is possible to extract a character area that accurately reflects information that has been lost in binarization in a color image, and it is possible to perform more accurate area division processing.
[0063]
Note that the present invention can be applied to a system including a plurality of devices (for example, a host computer, an interface device, a reader, and a printer), and a device (for example, a copying machine and a facsimile device) including a single device You may apply to.
[0064]
Another object of the present invention is to supply a storage medium (or recording medium) in which a program code of software that realizes the functions of the above-described embodiments is recorded to a system or apparatus, and the computer (or CPU or CPU) of the system or apparatus. Needless to say, this can also be achieved by the MPU) reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention. Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.
[0065]
Furthermore, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is determined based on the instruction of the program code. It goes without saying that the CPU or the like provided in the expansion card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.
[0066]
【The invention's effect】
As described above, according to the present invention, it is possible to divide a region while maintaining the distinction between regions represented by colors.
[Brief description of the drawings]
FIG. 1 is a diagram showing an overview of a document processing apparatus according to an embodiment.
FIG. 2 is a block diagram showing a configuration of a document processing apparatus according to the present embodiment.
FIG. 3 is a diagram illustrating an outline of document digitization processing by the document processing apparatus according to the embodiment;
FIG. 4 is a flowchart illustrating binarization processing according to the present embodiment.
FIG. 5 is a diagram illustrating an example of a color document image used for describing the embodiment.
6 is a diagram showing a histogram of luminance values obtained from the color image shown in FIG. 5. FIG.
7A illustrates an image obtained by binarizing the color image illustrated in FIG. 5 with the threshold value t1 illustrated in FIG. 6, and FIG. 7B illustrates an image binarized using the threshold value t2.
8 is a diagram showing a histogram of luminance values in a region 703 in FIG.
FIG. 9 is a diagram showing a histogram of luminance values in a region 704 in FIG.
FIG. 10 is a diagram showing area information.
11 is a diagram showing a final binary image of the color image shown in FIG. 5. FIG.
FIG. 12 is a flowchart illustrating area division processing according to the present embodiment.
FIG. 13 is a flowchart for describing element extraction and tree structuring processing in step S1201 of FIG. 12;
FIG. 14 is a diagram illustrating an example of 8-connected contour blocks of black pixels.
FIGS. 15A and 15B are diagrams illustrating an example in which the outline of a black pixel block is a square, and FIG. 15B is a case in which the outline of a black pixel block is a non-rectangle.
FIG. 16 is a diagram illustrating an example of four connected outline blocks of white pixels.
FIG. 17 is a diagram illustrating an arrangement example of internal white pixel blocks in a frame (FRAME) and a diagram (PICTURE).
FIG. 18 is a diagram showing an example of a document image and an example of a tree structure obtained by processing this by element extraction and tree structuring in step S1201.
FIG. 19 is a flowchart for describing tree structure change processing in step S1202 of FIG. 12;
FIG. 20 is a diagram illustrating an example of the tree structure changing process described above.
FIG. 21 is a flowchart for describing character area creation processing in step S1203;
FIG. 22 is a diagram illustrating an example of region division processing for a document having a frame region.
FIG. 23 is a diagram illustrating a state in which a color document having a colored background area is correctly divided according to the present embodiment.
FIG. 24 is a diagram illustrating a state in which region division is performed on a color document having a colored background region by a general method.
FIG. 25 is a diagram illustrating an example of region division.

Claims

First determination means for determining a plurality of threshold values for binarizing the luminance image;
Obtaining means for obtaining a plurality of binary images by binarizing the luminance image using each of the plurality of threshold values determined by the first determining means;
Generation means for, based on the area of the connected component of black pixels, and generates area information in which the luminance of the background indicating the position and size of each different regions included in each of the plurality of binary images obtained by the obtaining means,
Second determination means for determining whether or not to invert the luminance value for each region indicated by the region information in the luminance image;
For each area indicated by the area information in the luminance image, setting means for setting a threshold for binarizing each area;
For each region indicated by the region information in the luminance image, with respect to the region determined to be inverted by the second determining unit, a corresponding region in the luminance image whose luminance value has been inverted is determined by the setting unit. A binary partial image is obtained by binarizing with a threshold value set for the area, while the luminance value is inverted for the area determined not to be inverted by the second determining means. A binary partial image acquisition unit that acquires a binary partial image by binarizing a corresponding region in the non-luminance image with a threshold value set for the region by the setting unit;
By overwriting the binary partial image of each area acquired by the binary partial image acquisition means on the corresponding area in the binary image obtained by binarizing the entire surface of the luminance image with a predetermined threshold, the luminance image and binarizing means for obtaining a second binary image about,
A document element is extracted from the second binary image obtained by the binarizing means, each area indicated by the area information generated by the generating means is set as a frame element, and the document element, the frame element, A division processing means for obtaining a tree structure based on the image and dividing the luminance image into regions based on the tree structure ;
The tree structure obtained by the division processing means is a tree structure in which document elements included in each area indicated by the area information among the extracted document elements are descendants of a frame element corresponding to each area. Feature document processing device.

Said generating means, the region area of the connected component is greater than the predetermined size of the black pixels included in each of the plurality of binary images, said brightness of the base is determined to different regions, the luminance of the background is different areas the document processing apparatus according to generate area information indicating a position and size in claim 1, wherein the.

The setting means, on the basis of the histogram of the region indicated by the region information in the luminance image, according to claim 1 or 2, characterized in that setting a threshold for binarizing the respective regions, respectively Document processing device.

Said second determining means, based on the histogram of the region indicated by the region information in the luminance image, to claim 1, wherein the determining whether to perform inversion of the brightness values for the respective areas 4. The document processing apparatus according to any one of items 3.

The division processing means includes
And forming means for forming a tree structure from the second binary image obtained by said binarizing means extracts said document element,
Each area indicated by the area information is inserted as a frame element in the tree structure formed by the forming means, and each document element indicated by the area information among the extracted document elements corresponds to each area. Change means for changing the tree structure by moving to the descendants of the elements of the frame to be
The document processing apparatus according to claim 1, wherein region division is performed based on the tree structure changed by the changing unit.

The document processing apparatus according to the color image in any one of claims 1 to 5, wherein the obtaining further Bei converting means for converting the luminance image.

Said first determining means takes a histogram of the intensity image, a document processing apparatus according to any one of claims 1 to 6, wherein determining said plurality of threshold values from the histogram.

First determining means, a first determination step of plural determining a threshold for binarizing the brightness image,
An acquisition unit that acquires a plurality of binary images by binarizing the luminance image using each of the plurality of threshold values determined in the first determination step;
Generating means, based on the area of the connected component of black pixels contained multiple binary images respectively acquired by the acquisition step, and generates area information in which the luminance of the background indicating the position and size of each different region Generation process;
A second determination step in which a second determination unit determines whether to invert the luminance value for each region indicated by the region information in the luminance image;
A setting step in which a setting unit sets a threshold for binarizing each region for each region indicated by the region information in the luminance image;
The binary partial image acquisition means, for each region indicated by the region information in the luminance image, for the region determined to be inverted in the second determination step, the correspondence in the luminance image with the luminance value inverted. A binary partial image is obtained by binarizing the area to be processed with the threshold value set for the area in the setting step, and on the other hand, the region determined not to be inverted in the second determination step Is a binary partial image that obtains a binary partial image by binarizing a corresponding region in the luminance image that has not been inverted in luminance value with the threshold value set for the region in the setting step. Acquisition process;
The binarizing means overwrites the binary partial image of each area acquired in the binary partial image acquiring step on the corresponding area in the binary image obtained by binarizing the entire surface of the luminance image with a predetermined threshold. A binarization step for obtaining a second binary image related to the luminance image ;
The division processing means extracts a document element from the second binary image obtained in the binarization step, uses each region indicated by the region information generated in the generation step as a frame element, and the document element Obtaining a tree structure based on the frame elements and dividing the luminance image into regions based on the tree structure ,
The tree structure obtained in the division processing step is a tree structure in which document elements included in each area indicated by the area information among the extracted document elements are descendants of frame elements corresponding to the areas. Characteristic document processing method.

Computer
First determination means for determining a plurality of threshold values for binarizing the luminance image;
An acquisition unit for acquiring a plurality of binary images by binarizing the luminance image using each of the plurality of threshold values determined by the first determination unit, and a plurality of binary images acquired by the acquisition unit, respectively. generating means based on the area of the connected component of black pixels, for generating area information indicating the position and size of each area where the luminance of the background is different contained,
Second determination means for determining whether or not to invert the luminance value for each region indicated by the region information in the luminance image;
Setting means for setting a threshold value for binarizing each area for each area indicated by the area information in the luminance image;
For each region indicated by the region information in the luminance image, with respect to the region determined to be inverted by the second determining unit, a corresponding region in the luminance image whose luminance value has been inverted is determined by the setting unit. A binary partial image is obtained by binarizing with a threshold value set for the area, while the luminance value is inverted for the area determined not to be inverted by the second determining means. A binary partial image obtaining unit that obtains a binary partial image by binarizing a corresponding region in the non-luminance image with a threshold value set for the region by the setting unit;
By overwriting the binary partial image of each area acquired by the binary partial image acquisition means on the corresponding area in the binary image obtained by binarizing the entire surface of the luminance image with a predetermined threshold, the luminance image binarizing means for obtaining a second binary image about,
A document element is extracted from the second binary image obtained by the binarizing means, each area indicated by the area information generated by the generating means is set as a frame element, and the document element, the frame element, The tree structure is obtained based on the document element included in each area indicated by the area information among the extracted document elements as a descendant of a frame element corresponding to each area. A division processing means for dividing an area of the luminance image based on the tree structure ,
A computer-readable storage medium storing a program for functioning as a computer.