JP4509423B2

JP4509423B2 - Image processing method and image processing apparatus

Info

Publication number: JP4509423B2
Application number: JP2001167016A
Authority: JP
Inventors: 大輔阿部; 宣浩綱島; 守人塩原
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2001-06-01
Filing date: 2001-06-01
Publication date: 2010-07-21
Anticipated expiration: 2021-06-01
Also published as: JP2002358529A

Description

【０００１】
【発明の属する技術分野】
本発明は、映像中の文字や記号を自動で読み取り可能にする画像処理方法および画像処理装置に関し、特に、抽出された複数枚の画像から高解像度で文字や記号を読み取ることができる画像処理に関する。
【０００２】
【従来の技術】
従来から、例えば、学校や会社などにおいて、建物の出入り口にカメラを設置しておき、出入りする人物の身体に付けられた名札の文字を読み取ることが行われている。この読み取られた名札の文字に基づいて、建物への人の出入を管理するシステムが設置されていた。
【０００３】
また、製品生産工場などの製造現場において、出荷物をベルトコンベア上に流して仕分けする際、現場に設置されているカメラにより、ベルトコンベア上を流れ移動している荷物の映像を撮影し、その映像から荷物に貼られているシールの文字を読み取ることで、移動する各荷物を自動的に送り先ごとに仕分けする物流監視システムなどが使用されている。
【０００４】
映像中で移動している文字や記号を読み取る技術を、例えば、図１に示されるような、ベルトコンベア２、照明装置３、カメラ４、認識装置５、そしてモニタを備えた物流監視システムを用いて説明する。
生産工場などにおいて、ベルトコンベア２の傍らに設置されているカメラ４によって、移動する出荷物１の映像を撮影し、認識装置５でその映像から出荷物１に貼られているラベル７の文字を読み取り、出荷物１を送り先などに仕分けするというように、出荷物１の物流監視が行われている。
【０００５】
映像中のラベル７における文字や記号を読み取る方法は、例えば、テンプレートマッチング（テレビジョン学会編、「画像工学 −画像のエレクトロニクス−」、pp.132-133）と呼ばれる画像処理方法を用いていた。この技術は、物体中の文字が書かれている部分を、予め「辞書」として持っている文字のテンプレートと照合させ、辞書のテンプレートと最も類似しているものを読み取り、その結果を出力している。
【０００６】
この技術による読み取り方法では、カメラ４と出荷物４とが離れているため、ラベル７に書かれている文字が小さくなると、文字自体が潰れてしまい、他の文字と似通ったものとなり、読み取ったときの文字認識が困難になる。その結果、ある程度大きく映っている文字を対象とせざるを得ない。そのため、この技術により、ラベル７上の文字や記号を読み取る場合、文字や記号を大きく撮影する必要があり、カメラ４の撮影範囲が限定されることとなる。しかし、例えば、物流監視システムにおいては、ベルトコンベア２上を移動する出荷物１には、様々な大きさのものがあることや、出荷物１の種類によっては貼られたラベル７の位置が異なるなどの理由により、確実にラベル７を検出できるようにするために、カメラ４の撮影範囲をできるだけ広くとる必要性がでてくる。
【０００７】
そこで、この技術を用いてより広い範囲を監視するためには、例えば、特開平１１−２８４９８１号公報に見られるように、広い範囲を複数の領域に分け、複数台のカメラを用いて各カメラに各領域を割り当てるようにすることが開示されている。しかし、この手法では、カメラ台数が増加するばかりでなく、その増加に伴って照明装置、処理装置等の周辺機器も増えるため、コスト面や設置場所の確保などの問題があった。
【０００８】
【発明が解決しようとする課題】
以上述べたように、従来技術による読み取り方法では、「カメラの台数」と「文字や記号の読み取り可能範囲」とにトレードオフの関係があった。そのため、少ないカメラ台数で広い範囲を撮影した場合、対象となる文字や記号の解像度が低くなり、文字や模様の情報が不足するため、一部分が潰れたり欠けたりして、それらを読み取ることができないという問題がある。
【０００９】
この問題を解決するために、画像処理により画像の解像度を上げる高解像度化技術が開発（例えば、青木ら、「複数のディジタル画像からの超解像度処理」、第２回画像センシングシンポジウム、pp.65-70）されている。この高解像度化技術は、映像中の複数枚の画像を用い、各画像の画素の情報を足し合わせることで、高い解像度かつ鮮明な画像を生成するというものである。
【００１０】
ここで、高解像度化技術における高解像度化処理の原理を以下に説明する。
高解像度化とは、画像形成過程をモデル化し、その逆問題を解くことで形成された画像の原因となる高精細な画像を推定することである。具体的には、例えば、以下のような解法により高解像度画像を生成する。
Ｆ(X、Y)を理想的な高解像度画像とし、Ｆ(X、Y)を座標変換することによって得られるｋ番目のフレームにおける高解像度画像をＦ_k(X、Y)とする。つまり、座標変換式を
Ｘ＝Ｘ_k(x、y)、Ｙ＝Ｙ_k(x、y) …(1)
とすると、ｋ番目のフレームにおける高解像度画像は
Ｆ_k(X、Y)＝Ｆ(Ｘ_k(x、y)、Ｙ_k(x、y)) …(2)
となる。
【００１１】
ｋ番目の低解像度の観測画像をＧ_k(i、j)とおくと、
【００１２】
【数１】

【００１３】
ここで、ｗ(i、j:x、y)はＣＣＤ各画素の空間位置や開口特性によって定まる窓関数（ＰＳＦ）である。ここ整数値(i、j)に対して
ｗ(i、j;x、y)＝1、 i−0.5＜x＜i＋0.5、j−0.5＜y＜j＋0.5
＝0、その他 …(4)
であることを仮定する。これは各受光素子が正方形であり、隙間無く画像面を覆っていることを意味している。なお、ｗ(i、j;x、y)が別の形の関数であっても、以下の議論は成立する。(2)式及び(3)式より
【００１４】
【数２】

【００１５】
ここで、ｘ_k(X、Y)、ｙ_k(X、Y)は(1)式の座標変換の逆変換である。また、(δ(x、y)／δ(X、Y))は変数変換のヤコビアン(各場所毎に面積が何倍になっているかを表すスケールファクタ)であり、
ｗ_k(i、j;X、Y)
＝ｗ(i、j:ｘ_k(X、Y)、ｙ_k(X、Y))(δ(x、y)／δ(X、Y)) …(6)
とおくと
【００１６】
【数３】

【００１７】
が得られる。この式はｋ番目のフレームとして観測される画像Ｇ_k(i、j)と高解像度の理想画像Ｆ(X、Y)の関係を表す。観測画像Ｇ_k(i、j)が与えられたときに、その基となる高解像度の理想画像Ｆ(X、Y)を求めたいというのが解くべき問題である。この(7)式自身はＦ(X、Y)が任意に高解像度(X、Yが任意の実数)であっても成立するが、あまり解像度が高いと未知数の数が多すぎ、解が一意に定まらない。そこで、Ｆ(X、Y)が整数格子上でのみ値を持つ離散的な画像Ｈ(I、J)から先に定義した窓関数ｗ(I、J;X、Y)を使って
【００１８】
【数４】

【００１９】
で表されるものと仮定する。するとＦ(X、Y)は階段関数となり、未知数の数は整数格子点の数に削減される。なお、画像の離散表現は別の方法も有り得るが、任意の窓関数に対してこの議論は成り立つ。
【００２０】
【数５】

【００２１】
ここで、(11)式を最小二乗の意味で最適に満たす高解像度画像Ｈ(I、J)を求めればよい。つまり、次の評価関数
【００２２】
【数６】

【００２３】
を最小にするようなＨ(I、J)を求めればよい。
以上のような処理により、高解像度化画像が生成される。
しかしながら、このような高解像度化技術では、各画像の各画素を対応づけ画素値を重ね合わせることで、高解像度化画像を生成していくため、１画素以下の精度で各画像の位置を合わせる必要があり、各画像のうち１枚でもその位置がずれていると、鮮明な高解像度画像は得られない。このため、各画像の位置合わせが重要となる。しかし、対象の画像自体が低解像度になっている場合、画像中の対象の大きさが小さいため厳密に位置合わせを行うことは難しい。
【００２４】
そこで、本発明は、映像中を移動している対象物体の動きに合わせて、読み取りの対象物体の各画像で切り出された位置を拘束することで、画像全体を通して厳密な位置合わせを可能とし、高解像度をより向上する画像処理装置及び画像処理方法を提供することを目的とする。
【００２５】
【課題を解決するための手段】
以上の課題を解決するために、本発明では、移動する対象物体を固定されたカメラで撮影した画像から当該対象物体上の表示情報を自動で読み取る画像処理方法において、前記カメラで時系列に撮影された複数の画像毎に、予め設定された表示情報に係る配置情報に基づいて読取り対象となる表示情報を探索し、該探索された表示情報を含む所定範囲の画像領域に係る画像情報及び位置情報を記憶部に記憶する探索段階と、記憶された前記画像領域の各々の位置情報に基づいて、前記画像中を移動する前記対象物体の微小時間の移動経路である直線に沿って該位置情報を修正して、前記複数の画像毎における当該画像領域の座標を夫々求める位置修正段階と、求められた前記画像領域の各々の座標に基づいて前記画像領域の画像情報を座標変換して位置合わせを行う位置合わせ段階と、位置合わせされた前記各画像情報を重ね合わせて重ね合わせ画像を生成し、該重ね合わせ画像に基づいて前記表示情報に対する高解像度処理を行う高解像度化画像処理段階とを含めた。
【００２６】
そして、前記表示情報が文字又は記号を含み、前記探索段階では、前記複数の画像の各々において、前記文字又は記号の配置に係るレイアウト情報に基づいて前記表示情報を探索し、該探索された表示情報を含む前記画像領域を記憶するようにした。さらに、前記対象物体の移動が等速直線運動であるとして、前記各画像領域の位置情報が該等速直線上に沿った移動量が等しい位置に修正され、前記位置修正段階で求められた前記画像領域の座標が前記対象物体の移動経路から所定値を超えてずれているときには、前記位置合わせ段階において当該画像領域を位置合わせ処理の対象から除くようにした。さらに、前記高解像度化画像処理段階で得られた高解像度化画像に基づいて、前記表示情報に含まれる前記文字又は記号の認識を行う文字認識処理段階を含むこととした。
【００２７】
また、本発明では、上述した画像処理方法に対応して、移動する対象物体を固定されたカメラで撮影した画像から当該対象物体上の表示情報を自動で読み取る画像処理装置において、前記カメラで時系列に撮影された複数の画像を記憶する記憶手段と、予め記憶された表示情報に係るレイアウト情報に基づいて、前記記憶手段から読み出された前記複数の画像毎に、前記表示情報を探索し、該探索された表示情報を含む所定範囲の画像領域に係る画像情報及び位置情報を前記記憶手段に記憶する探索手段と、記憶された前記画像領域の各々の位置情報に基づいて、前記画像中を移動する前記対象物体の微小時間の移動経路である直線に沿って該位置情報を修正して、前記複数の画像毎における当該画像領域の座標を夫々求める位置修正手段と、求められた前記画像領域の各々の座標に基づいて前記画像領域の画像情報について座標変換して位置合わせを行う位置合わせ手段と、位置合わせされた前記各画像情報を重ね合わせて重ね合わせ画像を生成し、該重ね合わせ画像に基づいて前記表示情報に対する高解像度処理を行う高解像度化画像処理手段とを備えることとした。
【００２８】
【発明の実施の形態】
以下、本発明について、高解像度化技術を用いた文字認識処理装置に適用した場合の実施形態を説明する。
〔第１の実施形態〕
図１に示されるように、固定されたカメラ４によって、ベルトコンベア２上に載せられて移動する荷物１を撮影すると、モニタ６に映し出される荷物１に係る画像は、ベルトコンベアの移動に従って画面中を移動していくとする。ここで、対象である移動する荷物１に係る画像を含む複数枚の各画像から文字らしい領域を、例えば、対象中の文字列の配置情報により切り出す。切り出した文字領域の位置には、荷物１の振動などによって多少の誤差を含むため、各画像間での誤差が累積され全体として大きなずれが発生する。
【００２９】
そこで、本実施形態では、映像中を移動している対象の動きを、微小時間において何らかの運動で表現し、例えば、ベルトコンベア２上の荷物１の運動は、微小時間においては「等速直線運動」で表されるとし、読み取りの対象である荷物１に貼られているラベル７に係る各画像で切り出された位置が「等速直線運動」をするように対象の位置を拘束することで、画像全体を通して厳密な位置合わせを行うことができるようにした。このような位置合わせ処理により、良好な高解像度化処理が行うことができ、対象中の文字や記号の解像度を高くできるため、ラベル７に書かれた文字や記号を容易に読み取ることができる。
【００３０】
第１の実施形態による位置合わせ処理を含む文字認識処理装置のブロック構成を図２に示す。
文字認識処理装置１０には、カメラ４などの画像入力装置から得られるアナログ映像入力信号、又は画像記録装置からの出力などのディジタル映像入力信号が供給される。入力信号がアナログ映像信号である場合には、そのアナログ映像信号をＡ／Ｄ変換部１１でディジタル信号に変換した後に、或いは入力信号が画像記録装置からの出力などのディジタル映像信号である場合には、そのディジタル信号のまま入力される画像制御部１２を有する。そして、画像制御部１２で入力された画像を記憶する第１画像記憶部１３と、対象探索部１５や位置合わせ処理部１６で処理された結果を記憶する第２画像記憶部１４と、与えられたレイアウトに基づき、移動している対象を探索する対象探索部１５と、各時刻での画像における対象の位置を合わせる位置合わせ処理部１６と、各時刻での対象の位置を合わせた結果から高解像度化処理を行う高解像度化処理部１７と、高解像度化した画像に対して文字列や記号の読み取りを行う文字認識部１８を備えている。
【００３１】
以下、第１の実施形態に係る文字認識処理装置１０の動作について、ブロック毎に詳細に説明する。
・入力された画像のＡ／Ｄ変換
カメラ４などの画像入力装置から得られるアナログ映像をＡ／Ｄ変換部１１によりディジタル化し、後段の画像制御部１２へ出力する。ただし、入力映像がディジタル映像の場合は、Ａ／Ｄ変換部１１のない構成となる。
・Ａ／Ｄ変換された画像の制御
画像制御部１２は、Ａ／Ｄ変換部１１からの出力画像又は画像記録装置などからの出力画像によるディジタル映像信号を制御し、第１画像記憶部１３に画像を記憶するとともに、後段の対象探索部１５に送る。
・画像の記憶
第１画像記憶部１３には、画像制御部１２に入力されたディジタル映像信号に係る画像データを記憶する。また、ある程度の時間の画像データを記憶できるだけのメモリ容量を持っているため、後段の対象探索部１５で処理を行っている間も入力される画像データを記憶することができる。
・処理結果の記憶
第２画像記憶部１４には、対象探索部１５や位置合わせ処理部１６で処理された結果を記憶する。なお、第２画像記憶部１４を第１画像記憶部１３と別に示したが、記憶部としての構成としては、一つのものでもよく、説明の便宜上、個別の構成として示した。
・認識対象の探索
対象探索部１５では、画像制御部１２から送られた映像信号による画像に対象となる画像が含まれているかどうかを探す。
【００３２】
先ず、画像制御部１２から入力され、ある時刻ｔのタイミングで撮影された画像に、対象が存在するかどうかを調べる。具体的には、入力された画像に対して２値化、ラベリング処理を行うことで、図３（ａ）のように文字らしい部分を検出する。同図中では、文字らしい部分を、丸形状又は楕円形状で示した。次に、例えば、図３（ｂ）に示されるような、名札のように、予め、対象の文字列などの配置関係を示すレイアウト情報を与える。このとき、レイアウト情報は、どんな文字又は記号であるかは必要なく、それらの配置関係を判別することができる程度のものである。
【００３３】
その文字らしい部分の配置がレイアウト情報と類似している所定範囲の領域を見つける。レイアウト情報との類似度が、ある閾値以下の場合は、「対象なし」を表す信号を画像制御部１２に送り、画像制御部１２は、第１画像記憶部１３に記憶している次の時刻の画像を対象探索部１５へ出力し、次いで、対象探索部１５は、その画像に対して探索を行う。
【００３４】
その画像について、レイアウト情報との類似度がある閾値以上の場合は、類似していると判断し、図３（ａ）に示されるように、次の時刻における所定範囲を有する領域Ｐの四隅の座標(ｘ1、ｙ1)、(ｘ2、ｙ2)、(ｘ3、ｙ3)、(ｘ4、ｙ4)と、領域Ｐに係る画像データとを第２画像記憶部１４へ出力し記憶する。これと同時に、「対象あり」を表す信号を画像制御部１２に送る。
【００３５】
このようにして、画像制御部１２は、第１画像記憶部１３に記憶している時刻ｔが時刻ｔ0から時刻ｔn-1までであれば、各時刻の画像を対象探索部１５に順次入力する。そして、対象探索部１５では、順次入力される画像について順次探索処理を行う。ここで、入力される画像の枚数ｎは、高解像度化処理部１７で用いる枚数である。
【００３６】
次に、時刻ｔ1〜ｔn-1における対象の位置を追跡する。具体的には、例えば、時刻ｔ0で検出した対象の四隅の座標(ｘ10、ｙ10)、(ｘ20、ｙ20)、(ｘ30、ｙ30)、(ｘ40、ｙ40)から検出した領域ｐ0をテンプレートとしたテンプレートマッチングにより対象の追跡を実現する。追跡により得られた各時刻ｔiでの対象領域ｐiの四隅の座標(ｘ1i、ｙ1i)、(ｘ2i、ｙ2i)、(ｘ3i、ｙ3i)、(ｘ4i、ｙ4i)と、領域ｐiに係る画像データを第２画像記憶部１４へ出力し記憶していく。ここで、処理枚数がｎであれば、ｉ＝０〜（ｎ−１）である。
【００３７】
図４に、探索処理によって得られた対象領域の具体例を、ｎ＝４の場合について示した。対象領域が、ｐ0〜ｐ3の四角枠で示される。
以上の処理を、処理枚数分ｎだけ繰り返すことにより、各時刻における対象領域の四隅の座標が求まる。この四隅の座標を基に、後段の位置合わせ処理部１６で各画像の対象領域について位置合わせを行う。
・各画像の位置合わせ処理
対象探索部１５での処理が終了すると、第２画像記憶部１４に記憶された各時刻の画像と各画像での対象領域の四隅の座標が、位置合わせ処理部１６に順次入力される。入力された時刻ｔ0から時刻ｔn-1までのｎ枚の画像における対象の位置合わせを行う。
【００３８】
先ず、ｎ枚の画像のある一枚を取り出し、ある大きさに拡大した画像(以下基準画像と呼ぶ)を作成する。その画像に対応するように、残りの（ｎ−１）枚の画像の位置を合わせるための射影変換を求める。拡大する大きさが、高解像度化処理により生成される画像の大きさとなる。
一般に、２枚の画像間で４点の対応関係が与えられると、２枚の画像を結び付ける射影変換が定まる。そこで、ｍ番目(ｍ＜ｎ)の入力画像における４頂点を(ｘm1、ｙm1)、(ｘm2、ｙm2)、(ｘm3、ｙm3)、(ｘm4、ｙm4)とし、基準画像における座標を(ｘM1、ｙM1)、(ｘM2、ｙM2)、(ｘM3、ｙM3)、(ｘM4、ｙM4)とし、基準画像をｍ番目の入力画像に変換する射影変換は、
ｘmi＝(a1・ｘMi＋a2・ｙMi＋a3)／(a7・ｘMi＋a8・ｙMi＋a9)
ｙmi＝(a4・ｘMi＋a5・ｙMi＋a6)／(a7・ｘMi＋a8・ｙMi＋a9)
（４頂点であるので、i＝1、2、3、4） …(13)
となる。
【００３９】
対象探索部１５で求められた各画像の四隅の座標を、(13)式に代入する。このことにより、変換係数ａ1〜ａ9の９個の未知数に対して８個の線形方程式を与えることになるが、(13)式は、ａ1〜ａ9を定数倍しても変化しないスケール不変性があるので、例えば、ａ9＝1であるという制約を与えれば、解が一意的に定まる。以上の処理により、基準画像をｍ番目の入力画像に変換する射影変換が求められる。この逆変換をｍ番目の入力画像に施せば、基準画像と位置が合うことになる。
【００４０】
ここで、実際には、各画像の対象領域の解像度が低いため、求まった対象領域の位置には、誤差を含んでいる。そこで、対象物体の運動に対応して、位置合わせを修正する。移動している対象物体は、微小時間の間隔で見た場合、決められた所定の運動をしていると見ることができる。その運動を満足するように、各時刻で検出された対象領域の位置を修正してやることにより、対象領域の位置合わせを厳密に行うことができる。
【００４１】
具体的には、例えば、図４のように、時刻ｔ0〜ｔ3での各逆射影変換による対象領域ｐ0〜ｐ3の位置が表されている場合、図５に示されるように、微小時間における対象領域の運動を所定の運動に当て嵌める。例えば、対象物体の運動が等速直線運動と見なす。図５においては、破線Ｌによる矢印のように運動しているとする。同図は、対象物体の運動をモニタ６上の画面で見た状態で示している。
【００４２】
先ず、各時刻での対象領域の四隅の位置に対応する点のうち、領域ｐ0〜ｐ3の左上の点が領域を代表する位置であるとして、それらの座標を、図５中に丸印で示されるように、(ｘ10、ｙ10)、(ｘ11、ｙ11)、(ｘ12、ｙ12)、(ｘ13、ｙ13)とする。そこで、図６に示すように、図５に丸印で示されるような各対象領域の代表位置について、対象物体の運動を表す直線Ｌ上に並ぶように修正する。図示のように、各座標は、(ｘm10、ｙm10)、(ｘm11、ｙm11)、(ｘm12、ｙm12)、(ｘm13、ｙm13)に修正される。しかし、ここでは、各座標が直線Ｌ上に位置されたに過ぎず、まだ等速運動に修正されていない。
【００４３】
次に、各座標が、図７のように直線Ｌで示される等速運動をしているとするために、各時間間隔の移動量が等しくなるように修正される。修正された後の各座標は、図７に示されるように、(ｘM10、ｙM10)、(ｘM11、ｙM11)、(ｘM12、ｙM12)、(ｘM13、ｙM13)となって、対象領域ｐ0〜ｐ3が、直線Ｌに沿った等速直線運動となるように、各領域の座標が修正される。
【００４４】
このように、各時刻の座標を全体の運動により拘束することで、各時刻の画像間で生じる誤差を累積することがないため、大きなずれのない厳密な位置合わせを可能とする。また、対象の運動により拘束せずに、四隅の点の対応のみで射影変換を求める場合も含む。
以上のように、各時刻ｔ0〜ｔ3での対象の位置を合わせ、重ね合わした画像をつくり、その画像を高解像度化処理部１７へ出力する。
・重ね合わされた画像の高解像度化処理
高解像度化処理部１７では、位置合わせ処理部１６で生成されたｎ枚の画像に対して高解像度化処理を行う。ここで行う高解像度化とは、画像形成過程をモデル化し、その逆問題を解くことで形成された画像の原因となる高精細な画像を推定することであり、具体的には、前述の(1)乃至(12)式による高解像度化処理である。(12)式で表される評価関数を最小にするようなＨ(I、J)が求められる。このような処理により生成された高解像度化画像が、を文字認識部１８に出力される。
・高解像度化画像からの文字認識
文字認識処理部１８は、高解像度化処理部１７から出力された画像を用いて文字認識を行う。
【００４５】
具体的には、例えば、文字レイアウトに含まれる可能性のある文字(漢字、数字、アルファベットなど)の大きさや位相の異なるパターンを認識するために、予め辞書として用意しておく。そこで、高解像度化処理を施された画像中の文字と辞書画像とのテンプレートマッチングを行い、マッチング度が高いものを文字などの認識結果として出力する。テンプレートマッチングの方法としては、入力画像（Ｘ、Ｙ）におけるｉ番目の画素Ｘ_i、Ｙ_iについて、(14)式の計算を行い、マッチング度Ｍが小さいほど似通った領域であるとする。ただし、Ｍ≧０である。
【００４６】
【数７】

【００４７】
以上の処理による文字などの認識結果を、ベルトコンベア２の下流において別途用意されている仕分け装置に出力する。その仕分け装置によって、対象物体である荷物１の仕分けを自動で行うことや、モニタ６に出力して作業員が荷物１の仕分けをすることなどのように、物流監視システムを構築できる。
〔第２の実施形態〕
第２の実施形態による文字認識処理装置１０のブロック構成を図８に示す。
【００４８】
図２に示された第１の実施形態による文字認識処理装置１０において、位置合わせ処理部１６の後段に、画像選択処理部１９を加えたものである。第１の実施形態では、対象が発見された時刻ｔ0からｔn-1までの各時刻での画像であるｎ枚の入力画像を用い高解像度化処理を行っている。
しかし、入力されたｎ枚の画像には、位置合わせがうまく行えていない画像など高解像度化処理に不適な画像も含まれている可能性が高い。そのような画像を高解像度化処理に用いた場合、悪い影響を与えることがある。
【００４９】
そこで、第２の実施形態では、ｎ枚の画像から位置ずれの大きい画像を除去することにより、良好な高解像度化が得られるようにした。
具体的には、例えば、上述した高解像度化処理における(12)式に示すテンプレートマッチングを、位置合わせを行った各画像と基準画像との間で行う、マッチング度Ｍの値が大きい画像ほど、位置ずれが大きいと考えられる。よって、画像選択処理部１９によりマッチング度を計算し、その値が閾値より大きい画像を除去した後で、比較的位置ずれが少ない画像に基づいて高解像度化処理を行う。
【００５０】
従って、高解像度化処理において、処理に不適当な画像を処理対象から取り除くことにより、一層精度の高い処理を行うことができる。
これまで、物流監視システムを例にして説明してきたが、これに限られず、建物への人の出入チェックシステムなど、移動する物に付けられている対象情報を認識する必要があるようなところに、本実施形態による複数画像を用いた高解像度化画像処理を適用することができる。
【００５１】
対象情報についても、文字又は記号だけでなく、物体中の特定形状などによる図形でもよい。この場合には、高解像度化処理部の後段にある文字認識部の代わりに、図形認識部を置く。
本実施形態による高解像度化処理を適用した文字認識処理装置の動作として、探索対象とする物体が等速直線運動に従って移動している例を挙げて説明したが、探索対象物体がカメラで撮影される範囲内で、決まった経路で移動することが把握されているならば、その経路を探索対象の運動としてそれに拘束するように各対象領域を修正するようにしてもよい。
【００５２】
また、これまで、カメラで撮影された画面内において、一つの対象物体の探索を行う例を説明したが、同時に２以上の対象物体を探索することもでき、この場合には、各対象物体のそれぞれの運動を予め決定しておくことにより、各対象物体について、一つのカメラによる撮影画像に基づいて広範囲の監視を実現することができる。
【００５３】
さらに、カメラによって撮影された画面が揺らいでいるような場合、対象物体が止まっていても、決まった経路として各対象領域を停止運動に拘束することで、より鮮明な高解像度化画像を作成することができる。
（付記１）移動する対象物体を撮影した映像から当該対象物体に係る対象情報を自動で読み取る画像処理方法であって、
前記映像における時系列の複数の画像毎に、前記対象情報を含む所定範囲の画像領域を探索する探索段階と、
探索された各々の前記画像領域の位置を、前記対象物体の移動に合った位置に修正して、各画像領域の位置合わせを行う位置合わせ段階と、
位置合わせされた前記各画像領域を重ね合わせて高解像度処理を行う高解像度化画像処理段階とを含むことを特徴とする画像処理方法。
（付記２）前記対象情報が文字又は記号を含み、
前記探索段階では、前記画像の各々において、前記文字又は記号に係るレイアウト情報に基づいて前記画像領域を探索することを特徴とする付記１に記載の画像処理方法。
（付記３）前記位置合わせ段階では、前記対象物体が微小時間毎に移動しているとし、前記画像領域の位置を前記時間に対応する前記対象物体の移動位置に従って修正し位置合わせを行うことを特徴とする付記１又は２に記載の画像処理方法。
（付記４）前記位置合わせ段階では、前記対象物体の移動が等速直線運動であるとして、前記各画像領域の位置を該等速直線上の位置に修正して位置合わせを行うことを特徴とする付記３に記載の画像処理方法。
（付記５）前記位置合わせ段階では、前記画像領域の位置が前記対象物体の移動位置から所定値を超えてずれているとき、当該画像領域を位置合わせ処理の対象から除くことを特徴とする付記１乃至４のいずれか一つに記載の画像処理方法。
（付記６）前記高解像度化画像に基づいて前記文字又は記号情報の認識を行う文字認識処理段階を含むことを特徴とする付記２乃至５のいずれか一つに記載の画像処理方法。
（付記７）移動する対象物体を撮影した映像から当該対象物体に係る対象情報を自動で読み取る画像処理装置であって、
前記映像における時系列の複数の画像毎に、前記対象情報を含む所定範囲の画像領域を探索する探索手段と、
探索された各々の前記画像領域の位置を、前記対象物体の移動に合った位置に修正して、各画像領域の位置合わせを行う位置合わせ手段と、
位置合わせされた前記各画像領域を重ね合わせて高解像度処理を行う高解像度化画像処理手段とを有することを特徴とする画像処理装置。
（付記８）前記対象情報が文字又は記号を含み、
前記探索手段は、前記画像の各々において、前記文字又は記号に係るレイアウト情報に基づいて前記画像領域を探索することを特徴とする付記７に記載の画像処理装置。
（付記９）前記位置合わせ手段は、前記対象物体が微小時間毎に移動しているとし、前記画像領域の位置を前記時間に対応する前記対象物体の移動位置に従って修正し位置合わせを行うことを特徴とする付記７又は８に記載の画像処理装置。
（付記１０）前記位置合わせ手段は、前記対象物体の移動が等速直線運動であるとして、前記各画像領域の位置を該等速直線上の位置に修正して位置合わせを行うことを特徴とする付記９に記載の画像処理装置。
（付記１１）前記位置合わせ手段には、前記画像領域の位置が前記対象物体の移動位置から所定値を超えてずれているとき、当該画像領域を位置合わせ処理の対象から除く画像選択手段を備えたことを特徴とする付記７乃至１０のいずれか一つに記載の画像処理装置。
（付記１２）前記高解像度化画像に基づいて前記文字又は記号情報の認識を行う文字認識処理手段を有することを特徴とする付記８乃至１１のいずれか一つに記載の画像処理装置。
【００５４】
【発明の効果】
本発明の効果は、映像中の文字や記号を含む移動している対象の各時刻での位置を、対象の運動により拘束することで厳密に合わせることができるため、高解像度化処理が良好に行え、自動で文字や記号を読み取ることができる。
それにより、カメラの撮影範囲を広げることができ、例えば、物流監視システムに適用した場合、複数台のベルトコンベアを１台のカメラでの監視が可能となり、システムの大幅なコストダウンにつながる。
【図面の簡単な説明】
【図１】物流監視システムの概略構成を示す図である。
【図２】第１の実施形態を適用した文字認識処理装置に係るブロック構成を示した図である。
【図３】レイアウト情報による認識対象の探索を説明する図である。
【図４】モニタに映し出された各時刻での認識対象の位置を示した図である。
【図５】微少時間における認識対象の運動を修正するための該対象画像の代表点を特定した状態を説明する図である。
【図６】各対象画像の代表点を直線上に並べ、認識対象の運動による画像位置の修正について説明する図である。
【図７】認識対象が等速直線運動となるように、各対象画像を各時間の移動量が等しくなるように修正した状態を示す図である。
【図８】第２の実施形態を適用した文字認識処理装置に係るブロック構成を示した図である。
【符号の説明】
１…荷物
２…ベルトコンベア
３…照明装置
４…カメラ
５…認識装置
６…モニタ
７…ラベル
１０…文字認識処理装置
１１…Ａ／Ｄ変換器
１２…画像制御部
１３…第１画像記憶部
１４…第２画像記憶部
１５…対象探索部
１６…位置合わせ処理部
１７…高解像度化処理部
１８…文字認識部
１９…画像選択処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing method and image processing apparatus that can automatically read characters and symbols in a video, and more particularly to image processing that can read characters and symbols at a high resolution from a plurality of extracted images. .
[0002]
[Prior art]
2. Description of the Related Art Conventionally, for example, in schools and companies, a camera is installed at the entrance of a building, and characters on a name tag attached to the body of a person entering and exiting are read. Based on the characters on the read name tag, a system for managing the entrance and exit of people into the building was installed.
[0003]
Also, when sorting shipments on a belt conveyor at a production site such as a product production factory, the camera installed at the site takes a picture of the baggage flowing and moving on the belt conveyor. A logistics monitoring system that automatically sorts each moving package by destination by reading the letters on the sticker attached to the package from the video is used.
[0004]
For example, as shown in FIG. 1, a technology for reading a moving character or symbol in an image is used by a distribution monitoring system including a belt conveyor 2, a lighting device 3, a camera 4, a recognition device 5, and a monitor. I will explain.
In a production factory or the like, the camera 4 installed beside the belt conveyor 2 takes a picture of the moving shipment 1 and the recognition device 5 uses the image of the label 7 attached to the shipment 1 from the picture. Logistics monitoring of the shipment 1 is performed such as reading and sorting the shipment 1 into destinations.
[0005]
As a method of reading characters and symbols on the label 7 in the video, for example, an image processing method called template matching (Edition of Television Society, “Image Engineering—Image Electronics”, pp.132-133) is used. In this technology, the part in which the character in the object is written is collated with the template of the character that is held in advance as a “dictionary”, the one most similar to the template in the dictionary is read, and the result is output. Yes.
[0006]
In the reading method by this technique, since the camera 4 and the shipment 4 are separated from each other, when the character written on the label 7 becomes small, the character itself is crushed and becomes similar to other characters. Sometimes character recognition becomes difficult. As a result, it is necessary to target characters that are reflected to a certain extent. For this reason, when the characters and symbols on the label 7 are read by this technique, it is necessary to photograph the characters and symbols large, and the photographing range of the camera 4 is limited. However, in the physical distribution monitoring system, for example, the shipment 1 moving on the belt conveyor 2 has various sizes, and the position of the label 7 attached varies depending on the type of the shipment 1. For the reasons described above, it is necessary to make the photographing range of the camera 4 as wide as possible in order to reliably detect the label 7.
[0007]
Therefore, in order to monitor a wider range using this technique, for example, as seen in Japanese Patent Application Laid-Open No. 11-284981, the wide range is divided into a plurality of regions, and a plurality of cameras are used. It is disclosed that each area is assigned to the. However, in this method, not only the number of cameras increases, but peripheral devices such as lighting devices and processing devices increase with the increase in the number of cameras.
[0008]
[Problems to be solved by the invention]
As described above, in the conventional reading method, there is a trade-off relationship between the “number of cameras” and the “readable range of characters and symbols”. Therefore, when shooting a wide range with a small number of cameras, the resolution of the target characters and symbols will be low, and the information of characters and patterns will be insufficient, so some parts will be crushed or missing and they can not be read There is a problem.
[0009]
In order to solve this problem, high-resolution technology that increases the resolution of the image by image processing has been developed (for example, Aoki et al., “Super-resolution processing from multiple digital images”, 2nd Image Sensing Symposium, pp.65 -70). This high resolution technology uses a plurality of images in a video and adds pixel information of each image to generate a high resolution and clear image.
[0010]
Here, the principle of high resolution processing in the high resolution technology will be described below.
High resolution is to estimate a high-definition image that causes an image formed by modeling an image forming process and solving the inverse problem. Specifically, for example, a high resolution image is generated by the following solution.
Let F (X, Y) be an ideal high resolution image, and the high resolution image in the kth frame obtained by coordinate transformation of F (X, Y) is F _k (X, Y). In other words, the coordinate transformation formula is
X = X _k (x, y), Y = Y _k (x, y)… (1)
Then, the high resolution image in the kth frame is
F _k (X, Y) = F (X _k (x, y), Y _k (x, y))… (2)
It becomes.
[0011]
k-th observation image of low resolution _k (i, j)
[0012]
[Expression 1]

[0013]
Here, w (i, j: x, y) is a window function (PSF) determined by the spatial position and aperture characteristics of each CCD pixel. Here for integer values (i, j)
w (i, j; x, y) = 1, i−0.5 <x <i + 0.5, j−0.5 <y <j + 0.5
= 0, Others ... (4)
Suppose that This means that each light receiving element is square and covers the image surface without any gap. Even if w (i, j; x, y) is another form of function, the following argument holds. From formulas (2) and (3)
[0014]
[Expression 2]

[0015]
Where x _k (X, Y), y _k (X, Y) is the inverse transformation of the coordinate transformation of equation (1). Further, (δ (x, y) / δ (X, Y)) is a Jacobian of variable transformation (a scale factor indicating how many times the area is increased at each location),
w _k (i, j; X, Y)
= W (i, j: x _k (X, Y), y _k (X, Y)) (δ (x, y) / δ (X, Y)) (6)
And
[0016]
[Equation 3]

[0017]
Is obtained. This expression is the image G observed as the kth frame. _k This represents the relationship between (i, j) and the high-resolution ideal image F (X, Y). Observation image G _k The problem to be solved is that when (i, j) is given, it is desired to obtain a high-resolution ideal image F (X, Y) as a basis. This equation (7) itself holds even if F (X, Y) is arbitrarily high resolution (X, Y is any real number), but if the resolution is too high, there are too many unknowns and the solution is unique. Not determined. Therefore, using the window function w (I, J; X, Y) defined above from the discrete image H (I, J) where F (X, Y) has a value only on the integer lattice.
[0018]
[Expression 4]

[0019]
It is assumed that Then, F (X, Y) becomes a step function, and the number of unknowns is reduced to the number of integer lattice points. Note that there can be other methods for discrete representation of images, but this argument holds for arbitrary window functions.
[0020]
[Equation 5]

[0021]
Here, a high-resolution image H (I, J) that optimally satisfies Equation (11) in the sense of least squares may be obtained. In other words, the evaluation function
[0022]
[Formula 6]

[0023]
What is necessary is just to obtain H (I, J) that minimizes.
A high resolution image is generated by the processing as described above.
However, in such a high resolution technology, each pixel of each image is associated with each other and the pixel values are overlapped to generate a high resolution image. Therefore, the position of each image is aligned with an accuracy of one pixel or less. It is necessary, and if even one of the images is misaligned, a clear high-resolution image cannot be obtained. For this reason, alignment of each image is important. However, when the target image itself has a low resolution, it is difficult to perform precise alignment because the size of the target in the image is small.
[0024]
Therefore, the present invention enables exact alignment throughout the entire image by constraining the position cut out in each image of the target object to be read in accordance with the movement of the target object moving in the video, An object of the present invention is to provide an image processing apparatus and an image processing method that further improve the high resolution.
[0025]
[Means for Solving the Problems]
In order to solve the above-described problems, in the present invention, in an image processing method for automatically reading display information on a target object from an image obtained by capturing the target object to be moved with a fixed camera, the camera is time-sequentially photographed. For each of the plurality of images, search is made for display information to be read based on arrangement information related to preset display information, and image information and position relating to an image area in a predetermined range including the searched display information A search stage for storing information in the storage unit, and the position of the target object moving in the image based on the stored positional information of each of the image areas A straight line that is a moving path of minute time The position information is corrected along each of the plurality of images, and the position correction step for obtaining the coordinates of the image area for each of the plurality of images, and the image information of the image area based on the coordinates of the obtained image areas. An alignment step for performing alignment by performing coordinate conversion, and a high resolution for generating a superimposed image by superimposing the aligned image information and performing high resolution processing on the display information based on the superimposed image Image processing stage.
[0026]
The display information includes characters or symbols, and in the search stage, the display information is searched based on layout information relating to the arrangement of the characters or symbols in each of the plurality of images, and the searched display The image area including information is stored. In addition, Moving the target object Is the constant-velocity linear motion, the position information of each image region is corrected to a position where the movement amount along the constant-velocity straight line is equal, and the coordinates of the image region obtained in the position correction step are Moving path of the target object The image area is excluded from the object of the alignment process in the alignment step when it deviates beyond a predetermined value. Furthermore, a character recognition processing step of recognizing the character or symbol included in the display information based on the high resolution image obtained in the high resolution image processing step is included.
[0027]
According to the present invention, in addition to the above-described image processing method, an image processing apparatus that automatically reads display information on a target object from an image obtained by capturing the target object to be moved with a fixed camera. The display information is searched for each of the plurality of images read out from the storage means based on storage means for storing a plurality of images photographed in series and layout information relating to display information stored in advance. A search means for storing image information and position information relating to a predetermined range of image area including the searched display information in the storage means, and based on the position information of each of the stored image areas, Of the target object moving A straight line that is a moving path of minute time The position information is corrected along each of the plurality of images, and position correction means for determining the coordinates of the image area for each of the plurality of images, and the image information of the image area based on the determined coordinates of the image area A positioning unit that performs positioning by performing coordinate conversion, and a high resolution that generates a superimposed image by superimposing the aligned image information and performs high resolution processing on the display information based on the superimposed image Image processing means.
[0028]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment in which the present invention is applied to a character recognition processing apparatus using a high resolution technology will be described.
[First Embodiment]
As shown in FIG. 1, when a fixed camera 4 photographs a load 1 that is placed on the belt conveyor 2 and moves, an image related to the load 1 displayed on the monitor 6 is displayed on the screen according to the movement of the belt conveyor. Let's move on. Here, a character-like area is extracted from each of a plurality of images including an image related to the moving luggage 1 as a target, for example, by arrangement information of a character string in the target. Since the position of the cut out character area includes some error due to the vibration of the luggage 1, errors between the images are accumulated and a large shift occurs as a whole.
[0029]
Therefore, in the present embodiment, the movement of the object moving in the video is expressed by some movement in a minute time. For example, the movement of the load 1 on the belt conveyor 2 is “a constant-velocity linear movement” in the minute time. , And by constraining the position of the object so that the position cut out in each image related to the label 7 attached to the package 1 that is the object of reading “constant linear motion”, It was made possible to perform exact alignment throughout the image. By such alignment processing, good resolution enhancement processing can be performed, and the resolution of characters and symbols in the object can be increased, so that the characters and symbols written on the label 7 can be easily read.
[0030]
FIG. 2 shows a block configuration of the character recognition processing device including the alignment processing according to the first embodiment.
The character recognition processing device 10 is supplied with an analog video input signal obtained from an image input device such as the camera 4 or a digital video input signal such as an output from the image recording device. When the input signal is an analog video signal, the analog video signal is converted into a digital signal by the A / D converter 11 or when the input signal is a digital video signal such as an output from the image recording apparatus. Has an image control unit 12 that is input as it is. Then, a first image storage unit 13 that stores an image input by the image control unit 12, a second image storage unit 14 that stores a result processed by the object search unit 15 and the alignment processing unit 16, and the like are given. Based on the layout, the object search unit 15 for searching for a moving object, the alignment processing unit 16 for aligning the position of the object in the image at each time, and the result obtained by aligning the position of the object at each time are high. A high resolution processing unit 17 that performs resolution processing and a character recognition unit 18 that reads a character string and a symbol from the high resolution image are provided.
[0031]
Hereinafter, the operation of the character recognition processing device 10 according to the first embodiment will be described in detail for each block.
・ A / D conversion of input images
An analog video obtained from an image input device such as the camera 4 is digitized by the A / D converter 11 and output to the subsequent image controller 12. However, when the input video is a digital video, the A / D converter 11 is not provided.
.Control of A / D converted images
The image control unit 12 controls a digital video signal based on an output image from the A / D conversion unit 11 or an output image from an image recording device, stores the image in the first image storage unit 13, and performs a target search in the subsequent stage Send to part 15.
・ Image storage
The first image storage unit 13 stores image data related to the digital video signal input to the image control unit 12. Also, since the memory capacity is sufficient to store image data for a certain amount of time, it is possible to store input image data even while processing is performed by the target search unit 15 in the subsequent stage.
・ Memory of processing results
The second image storage unit 14 stores the results processed by the object search unit 15 and the alignment processing unit 16. Although the second image storage unit 14 is shown separately from the first image storage unit 13, the configuration as the storage unit may be one, and is shown as an individual configuration for convenience of explanation.
・ Search for recognition target
The target search unit 15 searches for whether or not the target image is included in the image based on the video signal sent from the image control unit 12.
[0032]
First, it is checked whether an object exists in an image input from the image control unit 12 and photographed at a certain time t. Specifically, by performing binarization and labeling on the input image, a character-like portion is detected as shown in FIG. In the figure, the character-like portion is shown in a round shape or an elliptical shape. Next, for example, as shown in a name tag as shown in FIG. 3B, layout information indicating an arrangement relationship of a target character string or the like is given in advance. At this time, the layout information does not need to be any character or symbol, and is sufficient to determine the arrangement relationship between them.
[0033]
A region in a predetermined range in which the arrangement of the character-like portion is similar to the layout information is found. When the similarity to the layout information is equal to or less than a certain threshold value, a signal indicating “no target” is sent to the image control unit 12, and the image control unit 12 stores the next time stored in the first image storage unit 13. Are output to the target search unit 15, and then the target search unit 15 searches the image.
[0034]
If the degree of similarity with the layout information is greater than or equal to a threshold for the image, it is determined that the images are similar, and as shown in FIG. 3A, the four corners of the region P having a predetermined range at the next time are displayed. The coordinates (x1, y1), (x2, y2), (x3, y3), (x4, y4) and the image data relating to the region P are output to the second image storage unit 14 and stored. At the same time, a signal indicating “subject” is sent to the image control unit 12.
[0035]
In this way, the image control unit 12 sequentially inputs the images at each time to the target search unit 15 if the time t stored in the first image storage unit 13 is from the time t0 to the time tn-1. . Then, the object search unit 15 sequentially performs a search process for sequentially input images. Here, the number n of images to be input is the number used by the high resolution processing unit 17.
[0036]
Next, the position of the object at time t1 to tn-1 is tracked. Specifically, for example, a template using the area p0 detected from the coordinates (x10, y10), (x20, y20), (x30, y30), (x40, y40) of the four corners of the target detected at time t0 as a template. Target tracking is achieved by matching. The coordinates (x1i, y1i), (x2i, y2i), (x3i, y3i), (x4i, y4i) of the four corners of the target area pi at each time ti obtained by the tracking and the image data relating to the area pi are The data is output to the two-image storage unit 14 and stored. Here, if the number of processed sheets is n, i = 0 to (n−1).
[0037]
FIG. 4 shows a specific example of the target area obtained by the search process in the case of n = 4. The target area is indicated by square frames p0 to p3.
By repeating the above process by n for the number of processed sheets, the coordinates of the four corners of the target area at each time are obtained. Based on the coordinates of the four corners, the subsequent alignment processing unit 16 performs alignment for the target area of each image.
・ Each image alignment process
When the processing in the target search unit 15 is completed, the images at each time stored in the second image storage unit 14 and the coordinates of the four corners of the target region in each image are sequentially input to the alignment processing unit 16. The target is aligned in the n images from the input time t0 to time tn-1.
[0038]
First, one of n images is taken out and an image enlarged to a certain size (hereinafter referred to as a reference image) is created. Projective transformation is performed to match the positions of the remaining (n-1) images so as to correspond to the images. The size to be enlarged is the size of the image generated by the high resolution processing.
In general, when a four-point correspondence is given between two images, projective transformation that connects the two images is determined. Therefore, the four vertices in the mth (m <n) input image are (xm1, ym1), (xm2, ym2), (xm3, ym3), (xm4, ym4), and the coordinates in the reference image are (xM1, yM1). ), (XM2, yM2), (xM3, yM3), (xM4, yM4), and the projective transformation for converting the reference image into the mth input image is
xmi = (a1, xMi + a2, yMi + a3) / (a7, xMi + a8, yMi + a9)
ymi = (a4 · xMi + a5 · yMi + a6) / (a7 · xMi + a8 · yMi + a9)
(Because there are 4 vertices, i = 1, 2, 3, 4) (13)
It becomes.
[0039]
The coordinates of the four corners of each image obtained by the object search unit 15 are substituted into equation (13). This gives eight linear equations for the nine unknowns of the conversion coefficients a1 to a9, but the equation (13) shows that scale invariance does not change even if a1 to a9 are multiplied by a constant. Therefore, for example, if a constraint that a9 = 1 is given, the solution is uniquely determined. Through the above processing, projective transformation for converting the reference image into the mth input image is obtained. If this inverse transformation is applied to the mth input image, the position matches the reference image.
[0040]
Here, since the resolution of the target area of each image is actually low, the obtained position of the target area includes an error. Therefore, the alignment is corrected in accordance with the motion of the target object. When the moving target object is viewed at a minute time interval, it can be viewed as performing a predetermined movement. By correcting the position of the target area detected at each time so as to satisfy the motion, it is possible to precisely align the target area.
[0041]
Specifically, for example, as shown in FIG. 4, when the positions of the target areas p0 to p3 by the respective reverse projection transformations at the times t0 to t3 are represented, as shown in FIG. Fit the motion of the area to the predetermined motion. For example, the motion of the target object is regarded as a constant velocity linear motion. In FIG. 5, suppose that it is moving like the arrow by the broken line L. FIG. This figure shows the motion of the target object as viewed on the screen on the monitor 6.
[0042]
First, among the points corresponding to the four corner positions of the target area at each time, the upper left point of the areas p0 to p3 is assumed to be a position representing the area, and the coordinates thereof are indicated by circles in FIG. (X10, y10), (x11, y11), (x12, y12), (x13, y13). Therefore, as shown in FIG. 6, the representative positions of the target regions as indicated by the circles in FIG. 5 are corrected so as to be aligned on a straight line L representing the motion of the target object. As shown, the coordinates are corrected to (xm10, ym10), (xm11, ym11), (xm12, ym12), (xm13, ym13). However, here, each coordinate is only located on the straight line L and has not yet been corrected to a uniform motion.
[0043]
Next, since each coordinate is moving at a constant speed indicated by a straight line L as shown in FIG. 7, the amount of movement at each time interval is corrected to be equal. . Osamu As shown in FIG. 7, the coordinates after correction are (xM10, yM10), (xM11, yM11), (xM12, yM12), (xM13, yM13), and the target areas p0 to p3 are The coordinates of each region are corrected so as to achieve a uniform linear motion along the straight line L.
[0044]
In this way, by constraining the coordinates of each time by the overall motion, errors that occur between the images at each time are not accumulated, so that precise alignment without a large shift is possible. It also includes a case where the projective transformation is obtained only by correspondence of the four corner points without being restricted by the motion of the object.
As described above, the positions of the objects at the times t0 to t3 are aligned, an overlapped image is created, and the image is output to the high resolution processing unit 17.
・ High resolution processing of superimposed images
The high resolution processing unit 17 performs high resolution processing on the n images generated by the alignment processing unit 16. The high resolution performed here is to estimate the high-definition image that causes the image formed by modeling the image formation process and solving the inverse problem. This is high resolution processing according to equations (1) to (12). H (I, J) that minimizes the evaluation function expressed by equation (12) is obtained. The high resolution image generated by such processing is output to the character recognition unit 18.
・ Character recognition from high resolution images
The character recognition processing unit 18 performs character recognition using the image output from the high resolution processing unit 17.
[0045]
Specifically, for example, a dictionary is prepared in advance in order to recognize patterns having different sizes and phases of characters (kanji, numbers, alphabets, etc.) that may be included in the character layout. Therefore, template matching between the characters in the image subjected to the high resolution processing and the dictionary image is performed, and a character having a high matching degree is output as a recognition result of characters and the like. As a template matching method, the i-th pixel X in the input image (X, Y) _i , Y _i (14) is calculated, and it is assumed that the smaller the matching degree M, the more similar the regions. However, M ≧ 0.
[0046]
[Expression 7]

[0047]
The recognition result of characters and the like by the above processing is output to a sorting device prepared separately downstream of the belt conveyor 2. With this sorting device, a logistics monitoring system can be constructed, such as automatically sorting the package 1 as the object, or outputting to the monitor 6 and the worker sorting the package 1.
[Second Embodiment]
FIG. 8 shows a block configuration of the character recognition processing apparatus 10 according to the second embodiment.
[0048]
In the character recognition processing device 10 according to the first embodiment shown in FIG. 2, an image selection processing unit 19 is added after the alignment processing unit 16. In the first embodiment, high resolution processing is performed using n input images, which are images at each time from time t0 to time tn-1 at which the target was found.
However, it is highly likely that the input n images include an image that is not suitable for high resolution processing, such as an image that is not aligned well. When such an image is used for high resolution processing, it may have an adverse effect.
[0049]
Therefore, in the second embodiment, a high resolution can be obtained by removing an image with a large positional deviation from n images.
Specifically, for example, the template matching shown in Expression (12) in the high resolution processing described above is performed between each image that has been aligned and the reference image. The displacement is considered large. Therefore, after the degree of matching is calculated by the image selection processing unit 19 and an image whose value is larger than the threshold value is removed, the high resolution processing is performed based on the image with relatively little positional deviation.
[0050]
Therefore, in the high resolution processing, it is possible to perform processing with higher accuracy by removing an image inappropriate for processing from the processing target.
So far, we have explained the logistics monitoring system as an example, but this is not the only case, and it is necessary to recognize target information attached to moving objects, such as a system for checking people in and out of buildings. The high resolution image processing using a plurality of images according to the present embodiment can be applied.
[0051]
The target information may be not only characters or symbols but also a figure with a specific shape in the object. In this case, a graphic recognizing unit is placed instead of the character recognizing unit at the subsequent stage of the high resolution processing unit.
As an operation of the character recognition processing apparatus to which the high resolution processing according to the present embodiment is applied, an example in which an object to be searched is moving according to a uniform linear motion has been described. However, the object to be searched is captured by a camera. If it is known that the vehicle moves along a predetermined route within a certain range, each target region may be corrected so that the route is constrained as a motion of the search target.
[0052]
In addition, the example in which one target object is searched for in the screen shot by the camera has been described so far. However, two or more target objects can be searched at the same time. By determining each motion in advance, it is possible to realize a wide range of monitoring for each target object based on a photographed image by one camera.
[0053]
In addition, when the screen shot by the camera is shaking, even if the target object is stopped, each target area is constrained to stop motion as a fixed path, creating a clearer high-resolution image. be able to.
(Supplementary Note 1) An image processing method for automatically reading target information related to a target object from a video obtained by capturing the moving target object,
A search stage for searching a predetermined range of image areas including the target information for each of a plurality of time-series images in the video,
An alignment step of correcting the position of each searched image area to a position that matches the movement of the target object and aligning each image area;
And a high-resolution image processing step of performing high-resolution processing by superimposing the aligned image regions.
(Supplementary Note 2) The target information includes characters or symbols,
The image processing method according to claim 1, wherein, in the searching step, the image area is searched in each of the images based on layout information relating to the character or symbol.
(Supplementary Note 3) In the alignment step, the target object is moved every minute time, and the position of the image area is corrected according to the movement position of the target object corresponding to the time, and the alignment is performed. The image processing method according to

appendix

1 or 2, which is a feature.
(Supplementary Note 4) In the alignment step, the movement of the target object is assumed to be a constant velocity linear motion, and the alignment is performed by correcting the position of each image region to a position on the constant velocity straight line. The image processing method according to appendix 3.
(Supplementary Note 5) The supplementary note is characterized in that, in the alignment step, when the position of the image region is deviated from the movement position of the target object by more than a predetermined value, the image region is excluded from the target of the alignment process. The image processing method according to any one of 1 to 4.
(Additional remark 6) The image processing method as described in any one of additional remark 2 thru | or 5 including the character recognition process step which recognizes the said character or symbol information based on the said high resolution image.
(Supplementary Note 7) An image processing apparatus that automatically reads target information related to a target object from a video of the moving target object,
Search means for searching a predetermined range of image area including the target information for each of a plurality of time-series images in the video;
An alignment unit that corrects the position of each searched image area to a position that matches the movement of the target object, and aligns each image area;
An image processing apparatus comprising high resolution image processing means for performing high resolution processing by superimposing the aligned image regions.
(Supplementary note 8) The target information includes characters or symbols,
The image processing apparatus according to appendix 7, wherein the search unit searches the image area based on layout information relating to the character or symbol in each of the images.
(Additional remark 9) The said alignment means corrects the position of the said image area according to the movement position of the said target object corresponding to the said time, and performs alignment, assuming that the said target object is moving for every minute time. 9. The image processing apparatus according to appendix 7 or 8, which is a feature.
(Additional remark 10) The said alignment means corrects the position of each said image area | region to the position on this uniform velocity straight line, and performs alignment, assuming that the movement of the said target object is a uniform linear motion. The image processing apparatus according to appendix 9.
(Additional remark 11) When the position of the said image area has shifted | deviated beyond the predetermined value from the movement position of the said target object, the said position alignment means is provided with the image selection means which excludes the said image area from the object of the alignment process. The image processing apparatus according to any one of appendices 7 to 10, wherein
(Supplementary note 12) The image processing apparatus according to any one of supplementary notes 8 to 11, further comprising character recognition processing means for recognizing the character or symbol information based on the high resolution image.
[0054]
【The invention's effect】
The effect of the present invention is that the position at each time of the moving object including characters and symbols in the video can be precisely matched by constraining the movement of the object. Yes, you can read characters and symbols automatically.
As a result, the imaging range of the camera can be expanded. For example, when applied to a physical distribution monitoring system, a plurality of belt conveyors can be monitored by a single camera, leading to a significant cost reduction of the system.
[Brief description of the drawings]
FIG. 1 is a diagram showing a schematic configuration of a physical distribution monitoring system.
FIG. 2 is a block diagram illustrating a character recognition processing apparatus to which the first embodiment is applied.
FIG. 3 is a diagram illustrating a recognition target search based on layout information.
FIG. 4 is a diagram illustrating a position of a recognition target at each time displayed on a monitor.
FIG. 5 is a diagram for explaining a state in which representative points of the target image for correcting the motion of the recognition target in a minute time are specified.
FIG. 6 is a diagram for explaining correction of an image position by movement of a recognition target by arranging representative points of each target image on a straight line.
FIG. 7 is a diagram illustrating a state in which each target image is corrected so that the amount of movement at each time is equal so that the recognition target has a constant linear motion.
FIG. 8 is a block diagram illustrating a character recognition processing apparatus to which a second embodiment is applied.
[Explanation of symbols]
1 ... Luggage
2 ... Belt conveyor
3 ... Lighting device
4 ... Camera
5 ... Recognition device
6 ... Monitor
7 ... Label
10: Character recognition processing device
11 ... A / D converter
12. Image control unit
13: First image storage unit
14 ... Second image storage unit
15 ... Object search unit
16 ... Positioning processing unit
17 ... High resolution processing section
18 ... Character recognition part
19: Image selection processing section

Claims

An image processing method for automatically reading display information on a target object from an image obtained by capturing a moving target object with a fixed camera,
For each of a plurality of images taken in time series by the camera, the display information to be read is searched based on the arrangement information related to the preset display information, and an image in a predetermined range including the searched display information A search stage for storing image information and position information related to the region in the storage unit;
Based on the stored position information of each of the image areas, the position information is corrected along a straight line that is a moving path of the target object moving in the image in a minute time, and A position correction step for determining the coordinates of the image area,
An alignment step in which the image information of the image area is coordinate-converted and aligned based on the obtained coordinates of the image area;
A high-resolution image processing step of generating a superimposed image by superimposing the aligned image information and performing a high-resolution process on the display information based on the superimposed image. Processing method.

The display information includes characters or symbols;
In the search step, the display information is searched for in each of the plurality of images based on layout information relating to the arrangement of the characters or symbols, and the image area including the searched display information is stored. The image processing method according to claim 1.

In the position correction step, it is assumed that the movement of the target object is constant velocity linear motion, and the positional information of each image region is corrected to a position where the movement amount along the constant velocity straight line is equal. The image processing method according to claim 1 or 2 .

When the coordinates of the image area obtained in the position correction stage are deviated from the movement path of the target object by more than a predetermined value, the image area is excluded from the object of the alignment process in the alignment stage. The image processing method according to any one of claims 1 to 3 .

Based on the high resolution image obtained by the high resolution image processing step, according to claim 2 to 4, characterized in that it comprises a character recognition processing step of performing recognition of the characters or symbols included in the display information The image processing method as described in any one of Claims .

An image processing apparatus that automatically reads display information on a target object from an image obtained by capturing the moving target object with a fixed camera,
Storage means for storing a plurality of images taken in time series by the camera;
Based on layout information relating to display information stored in advance, the display information is searched for each of the plurality of images read out from the storage means, and an image region in a predetermined range including the searched display information is searched. Search means for storing the image information and position information in the storage means;
Based on the stored position information of each of the image regions, the position information is corrected along a straight line that is a moving path of the target object moving in the image in a minute time, and Position correcting means for respectively obtaining the coordinates of the image area;
Alignment means for performing alignment by performing coordinate conversion on image information of the image area based on the obtained coordinates of the image area;
A high-resolution image processing unit that superimposes the aligned image information to generate a superimposed image, and performs high-resolution processing on the display information based on the superimposed image;
An image processing apparatus comprising:

The display information includes characters or symbols;
The search means searches the display information based on layout information relating to the arrangement of the characters or symbols in each of the plurality of images, and stores the image area including the searched display information. The image processing apparatus according to claim 6.

In the position correction step, it is assumed that the movement of the target object is constant velocity linear motion, and the positional information of each image area is corrected to a position where the movement amount along the constant velocity straight line is equal. Item 8. The image processing apparatus according to Item 6 or 7 .

When the coordinates of the image area obtained by the position correction means are deviated from the movement path of the target object by more than a predetermined value, the alignment means excludes the image area from the object of the alignment process. the image processing apparatus according to any one of claims 6 to 8, characterized.

Based on the high resolution image obtained by the high resolution image processing unit, according to claim 7 to 9, characterized in that it comprises a character recognition processing means for recognition of the characters or symbols included in the display information The image processing apparatus according to any one of claims .