JP3590896B2

JP3590896B2 - Caption detection method

Info

Publication number: JP3590896B2
Application number: JP01561295A
Authority: JP
Inventors: 勝美谷口; 孝文宮武; 晃朗長坂
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-02-02
Filing date: 1995-02-02
Publication date: 2004-11-17
Anticipated expiration: 2019-11-17
Also published as: JPH08212231A

Description

【０００１】
【産業上の利用分野】
本発明は、字幕検出方法に関し、さらに詳しくは、画像中に字幕が有るか否かを判定する字幕検出方法に関する。
【０００２】
【従来の技術】
字幕検出方法については、次の従来技術がある。
特開平５−１３７０６６号公報には、ビデオ信号のエッジ成分を抽出してカラオケビデオ中の字幕部分と背景部分とを識別する技術が開示されている。
また、「大相撲対戦からの認識に基づく内容識別法、第４４回情報処理学会全国大会予稿集、２−３０１」には、画面を左部分と右部分とに分割し、左部分に縦書きされている字幕と右部分に縦書きされている字幕とから対戦力士を認識する技術が開示されている。
【０００３】
動画像の代表画像抽出装置については、次の従来技術がある。
特開平５−２４４４７５号公報では、フレーム間差分に基づいて画像の変化点を求め、その変化点を与える画像を代表画像として抽出する技術が提案されている。
【０００４】
その他の関連する従来技術として、特開平３−２７３３６３号公報，特開平３−２９２５７２号公報に開示の技術がある。
【０００５】
【発明が解決しようとする課題】
上記特開平５−１３７０６６号公報に開示の字幕検出方法は、字幕が横書きであることが前提であり、縦書きの字幕には対応できない。すなわち、カラオケビデオには対応できても、一般の画像には対応できない問題点がある。
また、上記「大相撲対戦からの認識に基づく内容識別法、第４４回情報処理学会全国大会予稿集、２−３０１」に開示の従来技術は、画面の左部分と右部分とに字幕がそれぞれ縦書きされていることが前提であり、やはり一般の画像には対応できない問題点がある。
そこで、本発明の第１の目的は、字幕の表示態様が任意である一般の画像に対して字幕が有るか否かを判定することが出来る字幕検出方法を提供することにある。
【０００６】
また、上記特開平５−２４４４７５号公報に開示の動画像の代表画像抽出装置では、画像の変化のみに着目して代表画像を抽出しているため、画像自体の変化は少ない場合には、必要な代表画像を抽出できない問題点がある。例えば、アナウンサーが複数のニュースを次々に読み上げているような画像の場合、画像自体の変化が少なく，字幕のみが変化するため、ニュースごとに代表画像を抽出することが出来ないことがある。
そこで、本発明の第２の目的は、字幕を検出し、その結果に基づいて代表画像を抽出することが出来る字幕検出方法を提供することにある。
【０００７】
【課題を解決するための手段】
【０００８】
第１の観点では、本発明は、画像を複数の領域に区分し、各領域別に第一の閾値以上の高輝度の画素数および第二の閾値以上の輝度値の差があるエッジ数を計数し、前記画素数が第三の閾値以上であり且つ前記エッジ数が第三の閾値以上の領域を字幕有りの領域と判別し、字幕有りの領域数を行方向および列方向に投影し、行方向に投影したときの字幕有りの領域数の最大値または列方向に投影したときの字幕有りの領域数の最大値が第四の閾値以上のときに画像中に字幕が有ると判定することを特徴とする字幕検出方法を提供する。
【０００９】
第２の観点では、本発明は、上記構成の字幕検出方法において、少なくとも過去２フレーム以上連続して同一場所に存在した高輝度の画素数およびエッジ数を計数することを特徴とする字幕検出方法を提供する。
【００１０】
第３の観点では、本発明は、上記構成の字幕検出方法において、水平方向の輝度差が第二の閾値以上のエッジと、垂直方向の輝度差が第二の閾値以上のエッジとを計数することを特徴とする字幕検出方法を提供する。
【００１１】
第４の観点では、本発明は、上記構成の字幕検出方法において、行方向に投影したときの字幕有りの領域数の最大値が、列方向に投影したときの字幕有りの領域数の最大値より大きい場合は、字幕が横書きであると判定し、そうでない場合は字幕が縦書きであると判定することを特徴とする字幕検出方法を提供する。
【００１２】
第５の観点では、本発明は、上記構成の字幕検出方法において、字幕有りと判定した画像の中から代表画像を選択することを特徴とする字幕検出方法を提供する。
【００１４】
第６の観点では、本発明は、上記構成の字幕検出方法において、字幕有りと判定した画像が時間的に連続するフレームであるとき、そのうちの一つのフレームの画像のみを代表画像として選択することを特徴とする字幕検出方法を提供する。
【００１５】
第７の観点では、本発明は、上記構成の字幕検出方法において、抽出した各代表画像を縮小して画面に並べて表示することを特徴とする字幕検出方法を提供する。
【００１６】
【作用】
上記第１の観点による字幕検出方法では、画像を複数の領域に区分し、各領域別に字幕の特徴量を算出し、それらの特徴量により各領域が字幕有りの領域か否かを判別する。そして、字幕有りの領域数を行方向および列方向に投影し、その投影結果に基づいて画像中に字幕が有るか否かを判定する。
これによれば、区分した領域別に字幕の有無を判別しているので、字幕の文字数が画面全体で少ない場合であっても、字幕の検出が可能である。また、字幕有りの領域数を行方向および列方向に投影し、その投影結果に基づいて画像中に字幕が有るか否かを判定しているので、字幕が横書きでも縦書きでも対応でき、字幕の表示位置の制限もない。従って、字幕の表示態様が任意である一般の画像に対して字幕が有るか否かを判定することが出来る。
【００１７】
さらに、上記第１の観点による字幕検出方法では、画像を複数の領域に区分し、各領域別に第一の閾値以上の高輝度の画素数および第二の閾値以上の輝度値の差があるエッジ数を計数し、前記画素数が第三の閾値以上であり且つ前記エッジ数が第三の閾値以上の領域を字幕有りの領域と判別する。そして、字幕有りの領域数を行方向および列方向に投影し、行方向に投影したときの字幕有りの領域数の最大値または列方向に投影したときの字幕有りの領域数の最大値が第四の閾値以上のときに画像中に字幕が有ると判定する。
これによれば、上記の作用に加えて、高輝度の画素数を計数しているので、背景よりも高輝度の画素で構成される文字を好適に判別できる。また、強エッジのエッジ数を計数しているので、背景よりもエッジの出現頻度の高い文字を好適に判別できる。そして、高輝度の画素数と強エッジのエッジ数を両方により領域に字幕が有るか無いかを判別しているので、高精度に判別できる。
【００１８】
上記第２の観点による字幕検出方法では、少なくとも過去２フレーム以上連続して同一場所に存在した高輝度の画素数およびエッジ数を計数する。
動画像では、背景の画素は変化しやすいが、字幕は視聴者が読み終るまで一定時間変化させずに表示される。そこで、過去のフレームと比較することにより、字幕にかかる画素やエッジを高精度に検出できる。
【００１９】
上記第３の観点による字幕検出方法では、水平方向の輝度差が第二の閾値以上のエッジと、垂直方向の輝度差が第二の閾値以上のエッジとを計数する。
例えば、窓のブラインドのような背景では、エッジが高頻度に出現する。しかし、水平方向のエッジまたは垂直方向のエッジの一方しか現われないので、両方を考慮することにより、窓のブラインドのような背景のエッジは計数されなくなり、誤判定を防止できる。
【００２０】
上記第４の観点による字幕検出方法では、行方向に投影したときの字幕有りの領域数の最大値が、列方向に投影したときの字幕有りの領域数の最大値より大きい場合は、字幕が横書きであると判定し、そうでない場合は字幕が縦書きであると判定する。
これにより、字幕の書式を検出できるようになる。
【００２１】
上記第５の観点による字幕検出方法では、字幕有りと判定した画像の中から代表画像を選択する。
このように字幕の有る画像を検出し、その中から代表画像を抽出するので、画像自体の変化が少なく，字幕のみが変化する動画像でも、代表画像を適切に抽出することが出来る。
【００２３】
上記第６の観点による字幕検出方法では、字幕有りと判定した画像が時間的に連続するとき、そのうちの一つのフレームの画像のみを代表画像として選択する。
これにより、例えば字幕の代り目の画像を抽出することが出来る。
【００２４】
上記第７の観点による字幕検出方法では、抽出した各代表画像を縮小して画面に並べて表示する。
これにより、複数の代表画像を一覧できるようになり、ユーザは簡単に所望のシーンを探し出すことが出来る。
【００２５】
【実施例】
以下、図を参照して本発明を詳細に説明する。なお、これにより本発明が限定されるものではない。
【００２６】
図１は、本発明の字幕検出方法を実施する動画像の代表画像抽出装置のシステム構成図である。
この動画像の代表画像抽出装置１０００において、ビデオ再生装置９は、動画像を再生するための光ディスクやビデオデッキ等の装置である。ビデオ再生装置９が扱う動画像の各フレームには、動画像の先頭から順にフレーム番号がつけられており、このフレーム番号がコンピュータ３から制御信号１０によってビデオ再生装置に送られることで、該当フレームの動画像が再生され、映像信号Ｖがビデオ入力装置１１へ出力される。
ビデオ入力装置１１は、前記映像信号Ｖをデジタル画像データ１２に変換し、コンピュータ３に送る。
【００２７】
コンピュータ３は、インターフェース６を介して、前記デジタル画像データ１２を取り込み、メモリ５に格納しているプログラムに従ってＣＰＵ４で処理する。メモリ５には、各種のデータが格納され、必要に応じて参照される。また、処理の必要に応じて、各種情報が外部記憶装置１３に蓄積される。
コンピュータ３に対する命令は、マウス等のポインティングデバイス７やキーボード８を使って行うことが出来る。
ＣＲＴ等のディスプレイ装置１はコンピュータ３の出力画面を表示し、スピーカ２はコンピュータ３の出力音声を発生する。
【００２８】
図２は、ディスプレイ装置１に表示する画面例である。
領域５０には、デジタル画像データ１２に基づく動画像を表示する。
領域６０には、本システムを制御するボタンと本システムの動作状況を表示する。開始ボタン６１は、代表画像抽出処理の実行開始を行なうボタンである。停止ボタン６２は、代表画像抽出処理の実行停止を行なうボタンである。ボタンを押す操作は、ユーザがポインティングデバイス７を操作してカーソル８０をボタン上に位置合わせし、クリックすることで行なう。検出画面数表示６３は、実行開始から現在までに抽出した代表画像の個数である。開始時間表示６４は、代表画像抽出処理の実行開始時刻である。
【００２９】
領域７０には、抽出したｍ個の代表画像を縮小して表示する（図２では、ｍ＝６）。すなわち、動画像のフレームに字幕が存在すると、そのフレームの画像を代表画像として抽出し、適切な大きさに縮小して領域７０に表示する。また、当該代表画像の抽出時間を合わせて表示する。抽出した代表画像が領域７０の表示可能数ｍを越えた場合には、自動スクロールし、最新のｍ個の代表画像だけを表示する。なお、ユーザがスクロールボタン７１，７３を押したり，スクロールバー７２をドラッグすることで、スクロールアウトした代表画像を表示させることが出来る。
【００３０】
図３は、代表画像抽出処理の機能ブロック図である。
動画像入力部１００は、デジタル画像データ１２をメモリ５に取り込み、ディスプレイ装置１の領域５０に動画像を表示する。
特徴抽出部１５０の領域別輝度計数部２００は、動画像の各フレームの画面を複数の領域に区分したときの各領域内の第一の閾値以上の高輝度の画素を検出し、それら画素数を出力する。
特徴抽出部１５０の領域別エッジ計数部３００は、動画像の各フレームの画面を複数の領域に区分したときの各領域内の第二の閾値以上のエッジを検出し、それらエッジ数を出力する。
字幕判定部４００は、前記画素数および前記エッジ数が第三の閾値以上の領域を字幕有りの領域と判別し、字幕有りの領域数を行方向および列方向に投影し、行方向に投影したときの字幕有りの領域数の最大値または列方向に投影したときの字幕有りの領域数の最大値が第四の閾値以上のときに、当該フレームの画像中に字幕が有ると判定する。
代表画像作成部５００は、字幕有りと判定したフレームの画像を縮小して代表画像としてメモリ５に記憶する。
表示部６００は、複数の縮小代表画像と抽出時刻をディスプレイ装置１の領域７０に並べて表示する。
【００３１】
図４は、メモリ５に記憶されるプログラムとデータの構成図である。
プログラム５−１は、代表画像抽出処理のプログラムである。このプログラム５−１は、以下のデータ５−２〜データ５−２７を参照する。
【００３２】
代表画像構造体５−２は、代表画像と付属データ（抽出時刻など）を格納する構造体である（図５に詳細を示す）。この代表画像構造体５−２は、抽出結果として蓄積するデータである。
【００３３】
闘値１（５−３）は、高輝度の画素を検出するための第一の閾値である。
闘値２（５−４）は、強エッジを検出するための第二の閾値である。
闘値３（５−５）は、字幕有りの区分領域を判別するための第三の閾値である。
閾値４（５−６）は、字幕が有るフレームを検出するための第四の閾値である。
上記闘値１（５−３），闘値２（５−４），闘値３（５−５）および閾値４（５−６）は、予め設定しておくデータである。
【００３４】
以下のデータ５−７〜データ５−２７は、１回あたりの処理に利用するワーク用データである。
画像データ５−７は、現在の処理対象のフレームのデジタル画像データであり、［２４０］×［３２０］個（＝画面の画素数：図１８参照）の配列データである。各配列は、赤画像データ５−７−１，緑画像データ５−７−２，青画像データ５−７−３の３種類の色成分データからなっている。
輝度データ５−８は、高輝度の画素の検出結果を示す［２４０］×［３２０］個の配列データである。
横エッジデータ５−９は、画面の横方向の輝度差が大きい画素（強エッジの画素）の検出結果を示す［２４０］×［３２０］個の配列データである。
縦エッジデータ５−１０は、画面の縦方向の輝度差が大きい画素（強エッジの画素）の検出結果を示す［２４０］×［３２０］個の配列データである。
【００３５】
前フレーム輝度データ５−１１は、現在の処理対象のフレームの前フレームの輝度データ（５−８）である。
前フレーム横エッジデータ５−１２は、現在の処理対象のフレームの前フレームの横エッジデータ（５−９）である。
前フレーム縦エッジデータ５−１３は、現在の処理対象のフレームの前フレームの縦エッジデータ（５−１０）である。
【００３６】
輝度照合データ５−１４は、前記輝度データ５−８と前記前フレーム輝度データ５−１１の両方が高輝度の画素を格納した［２４０］×［３２０］個の配列データである。
横エッジ照合データ５−１５は、前記横エッジデータ５−９と前記前フレーム横エッジデータ５−１２の両方が強エッジの画素を格納した［２４０］×［３２０］個の配列データである。
縦エッジ照合データ５−１６は、前記縦エッジデータ５−１０と前記前フレーム縦エッジデータ５−１３の両方が強エッジの画素を格納した［２４０］×［３２０］個の配列データである。
【００３７】
輝度領域データ５−１７は、領域ごとに前記輝度照合データ５−１４の高輝度の画素数を計数した結果を格納した配列データである。これは、［１０］×［１６］個（＝領域数：図１８参照）の配列データである。なお、本実施例では、画面を［１０］×［１６］の領域に区分しているが、１つの領域に字幕の文字が１つ入る程度のサイズに区分するのが好ましい。
横エッジ領域データ５−１８は、領域ごとに前記横エッジ照合データ５−１５の強エッジの画素数（エッジ数）を計数した結果を格納した［１０］×［１６］個の配列データである。
縦エッジ領域データ５−１９は、領域ごとに前記縦エッジ照合データ５−１６の強エッジの画素数（エッジ数）を計数した結果を格納した［１０］×［１６］個の配列データである。
上記輝度データ５−８〜縦エッジ領域データ５−１９は、前記特徴抽出部１５０が作成するデータである。
【００３８】
字幕領域データ５−２０は、領域ごとに字幕の有無の判別結果を格納した［１０］×［１６］個の配列データである。
字幕付属データ５−２１は、字幕が有るときの字幕の位置および方向のデータである。
行カウントデータ５−２２は、行ごとに字幕有りの領域の個数を格納した［１０］個の配列データである。
最大行カウントデータ５−２３は、前記行カウントデータ５−２２の配列データのうちの最大値を格納したデータである。
最大行位置データ５−２４は、前記行カウントデータ５−２２の配列データのうちの最大値に対応する行の行番号を格納したデータである。
列カウントデータ５−２５は、列ごとに字幕有りの領域の個数を格納した［１６］個の配列データである。
最大列カウントデータ５−２６は、前記列カウントデータ５−２５の配列データのうちの最大値を格納したデータである。
最大列位置データ５−２７は、前記列カウントデータ５−２５の配列データのうちの最大値に対応する列の列番号を格納したデータである。
前字幕領域データ５−２８は、現在の処理対象のフレームの前フレームの字幕領域データ（５−２０）である。
領域一致数５−２９は、現在の処理対象のフレームと前フレームとで字幕の有無が一致した領域数である。
上記字幕領域データ５−２０から領域一致数５−２９は、字幕判定部４００が作成するデータである。
【００３９】
図５は、前記代表画像構造体５−２の構成図である。
代表画像識別番号５−２−１は、抽出した代表画像の順番である。
代表画像データ５−２−２は、抽出した画像を縮小した配列データである。これは、［１２０］×［１６０］個（＝画面の画素数の１／２）の配列データである。各配列は、赤画像データ，緑画像データ，青画像データの３種類の色成分データからなっている。
代表画像表示位置Ｘ（５−２−３）および代表画像表示位置Ｙ（５−２−４）は、代表画像を領域７０に表示する際のＸ，Ｙ座標位置である。
字幕開始時間５−２−５は、当該代表画像にかかる字幕が出現した時刻である。
字幕終了時間５−２−６は、当該代表画像にかかる字幕が消失した時刻である。
字幕書式５−２−７は、当該代表画像にかかる字幕の表示方向と位置のデータである。
【００４０】
図６，図７，図８は、領域別輝度計数部２００における処理手順を示すフロー図である。
図６の処理２０１では、画素横位置カウンタＸおよび画素縦位置カウンタＹを“０”に初期化する。
処理２０２では、赤画像データ５−７−１，緑画像データ５−７−２，青画像データ５−７−３の配列［Ｙ］［Ｘ］の輝度値が闘値１（５−３）以上であるか否かを調べ、３色ともに闘値１以上の輝度であれば処理２０３へ移り、闘値１未満ならば処理２０４へ移る。
処理２０３では、輝度データ５−８の配列［Ｙ］［Ｘ］に“１”を書き込む。
処理２０４では、輝度データ５−８の配列［Ｙ］［Ｘ］に“０”を書き込む。
処理２０５〜処理２０９は、上記処理２０２〜処理２０４を全ての画素に対して行うためのアドレス更新処理である。上記処理２０２〜処理２０４を全ての画素に対して行って輝度データ５−８を作成完了すると、図７の処理２１０に移る。
【００４１】
図７の処理２１０では、画素横位置カウンタＸおよび画素縦位置カウンタＹを“０”に初期化する。
処理２１１では、輝度データ５−８の配列［Ｙ］［Ｘ］の値と前フレーム輝度データ５−１１の配列［Ｙ］［Ｘ］の値が両方とも“１”であるかどうかを調べ、両方とも“１”ならば処理２１２へ移り、そうでなければ処理２１３へ移る。
処理２１２では、輝度照合データ５−１４の配列［Ｙ］［Ｘ］に“１”を書き込む。
処理２１３では、輝度照合データ５−１４の配列［Ｙ］［Ｘ］に“０”を書き込む。
処理２１４〜処理２１８は、上記処理２１１〜処理２１３を全ての画素に対して行うためのアドレス更新処理である。上記処理２０２〜処理２０４を全ての画素に対して行って輝度照合データ５−１４を作成完了すると、処理２１９に移る。
【００４２】
処理２１９では、画素横位置カウンタＸおよび画素縦位置カウンタＹを“０”に初期化する。
処理２２０では、輝度データ５−８の配列［Ｙ］［Ｘ］の内容を前フレーム輝度データ５−１１の配列［Ｙ］［Ｘ］に複写する。
処理２２１〜処理２２５は、上記処理２２０を全ての画素に対して行うためのアドレス更新処理である。上記処理２２０を全ての画素に対して行って前フレーム輝度データ５−１１を更新完了すると、図８の処理２２６に移る。
【００４３】
図８の処理２２６では、領域内画素横位置カウンタｉおよび領域内画素縦位置カウンタｊおよび領域横位置カウンタＸｂおよび領域縦位置カウンタＹｂを“０”に初期化する。また、輝度領域データ５−１７を“０”に初期化する。
処理２２７では、輝度照合データ５−１４の配列［Ｙｂ＊２４＋ｊ］［Ｘｂ＊２０＋ｉ］の内容が“１”かどうかを調べ、“１”であれば処理２２８へ移り、そうでなければ処理２２９へ移る。
処理２２８では、輝度領域データ５−１７の配列［Ｙｂ］［Ｘｂ］に“１”を加える。
処理２２９〜処理２３９は、上記処理２２７，処理２２８を全ての画素に対して行うためのアドレス更新処理である。上記処理２２７，処理２２８を全ての画素に対して行って輝度領域データ５−１７を作成完了すると、領域別輝度計数部２００における処理を終了する。
【００４４】
図９，図１０，図１１は、領域別エッジ計数部３００における処理手順を示すフロー図である。
図９の処理３０１では、画素横位置カウンタＸおよび画素縦位置カウンタＹを“１”に初期化する。
処理３０２では、赤画像データ５−７−１，緑画像データ５−７−２，青画像データ５−７−３の配列［Ｙ］［Ｘ＋１］の輝度値と配列［Ｙ］［Ｘ−１］の輝度値の差が闘値２（５−４）以上であるか否かを調べ、３色ともに輝度値の差が闘値２以上であれば処理３０３へ移り、闘値２未満ならば処理３０４へ移る。
処理３０３では、横エッジデータ５−９（図４）の配列［Ｙ］［Ｘ］に“１”を書き込む。
処理３０４では、横エッジデータ５−９（図４）の配列［Ｙ］［Ｘ］に“０”を書き込む。
処理３０５では、赤画像データ５−７−１，緑画像データ５−７−２，青画像データ５−７−３の配列［Ｙ＋１］［Ｘ］の輝度値と配列［Ｙ−１］［Ｘ］の輝度値の差が闘値２（５−４）以上であるか否かを調べ、３色ともに輝度値の差が闘値２以上であれば処理３０６へ移り、闘値２未満ならば処理３０７へ移る。
処理３０６では、縦エッジデータ５−１０（図４）の配列［Ｙ］［Ｘ］に“１”を書き込む。
処理３０７では、縦エッジデータ５−１０（図４）の配列［Ｙ］［Ｘ］に“０”を書き込む。
処理３０８〜処理３１２は、上記処理３０２〜処理３０７を全ての画素に対して行うためのアドレス更新処理である。上記処理２０２〜処理２０４を画面の縁の画素を除く全ての画素に対して行って横エッジデータ５−９および縦エッジデータ５−１０を作成完了すると、図１０の処理３１３に移る。
【００４５】
図１０の処理３１３では、画素横位置カウンタＸおよび画素縦位置カウンタＹを“０”に初期化する。
処理３１４では、横エッジデータ５−９の配列［Ｙ］［Ｘ］の値と前フレーム横エッジデータ５−１２の配列［Ｙ］［Ｘ］の値が共に“１”であるかどうかを調べ、両方とも“１”ならば処理３１５へ移り、そうでなければ処理３１６へ移る。
処理３１５では、横エッジ照合データ５−１５の配列［Ｙ］［Ｘ］に“１”を書き込む。
処理３１６では、横エッジ照合データ５−１５の配列［Ｙ］［Ｘ］に“０”を書き込む。
処理３１７では、縦エッジデータ５−１０の配列［Ｙ］［Ｘ］の値と前フレーム縦エッジデータ５−１３の配列［Ｙ］［Ｘ］の値が共に“１”であるか否かを調べ、両方とも“１”ならば処理３１８へ移り、そうでなければ処理３１９へ移る。
処理３１８では、縦エッジ照合データ５−１６の配列［Ｙ］［Ｘ］に“１”を書き込む。
処理３１９では、縦エッジ照合データ５−１６の配列［Ｙ］［Ｘ］に“０”を書き込む。
処理３２０〜処理３２４は、上記処理３１４〜処理３１９を全ての画素に対して行うためのアドレス更新処理である。上記処理３１４〜処理３１９を全ての画素に対して行って横エッジ照合データ５−１５および縦エッジ照合データ５−１６を作成完了すると、処理３２５に移る。
【００４６】
処理３２５では、横エッジデータ５−９の配列［Ｙ］［Ｘ］の内容を前フレーム横エッジデータ５−１２の配列［Ｙ］［Ｘ］に複写する。また、縦エッジデータ５−１０の配列［Ｙ］［Ｘ］の内容を前フレーム縦エッジデータ５−１３の配列［Ｙ］［Ｘ］に複写する。
処理３２７〜処理３３１は、上記処理３２６を全ての画素に対して行うためのアドレス更新処理である。上記処理３２６を全ての画素に対して行って前フレーム横エッジデータ５−１２および前フレーム縦エッジデータ５−１３を更新完了すると、図１１の処理３３２に移る。
【００４７】
図１１の処理３３２では、領域内画素横位置カウンタｉおよび領域内画素縦位置カウンタｊおよび領域横位置カウンタＸｂおよび領域縦位置カウンタＹｂを“０”に初期化する。また、横エッジ領域データ５−１８および縦エッジ領域データ５−１９を“０”に初期化する。
処理３３３では、横エッジ照合データ５−１５の配列［Ｙｂ＊２４＋ｊ］［Ｘｂ＊２０＋ｉ］の内容が“１”かどうかを調べ、“１”であれば処理３３４へ移り、そうでなければ処理３３５へ移る。
処理３３４では、横エッジ領域データ５−１８の配列［Ｙｂ］［Ｘｂ］に“１”を加える。
処理３３５では、縦エッジ照合データ５−１６の配列［Ｙｂ＊２４＋ｊ］［Ｘｂ＊２０＋ｉ］の内容が“１”かどうかを調べ、“１”であれば処理３３６へ移り、そうでなければ処理３３７へ移る。
処理３３６では、縦エッジ領域データ５−１９の配列［Ｙｂ］［Ｘｂ］に“１”を加える。
処理３３７〜処理３４８は、上記処理３３３〜処理３３６を全ての画素に対して行うためのアドレス更新処理である。上記処理３３３〜処理３３６を全ての画素に対して行って横エッジ領域データ５−１８および縦エッジ領域データ５−１９を作成完了すると、領域別エッジ計数部３００における処理を終了する。
【００４８】
図１２，図１３，図１４は、字幕判定部４００および代表画像作成部５００における処理手順を示すフロー図である。なお、字幕判定部４００の処理を参照番号４ｘｘで示し、代表画像作成部５００の処理を参照番号５ｘｘで示す。
図１２の処理４０１では、領域横位置カウンタＸｂおよび領域縦位置カウンタＹｂを“０”に初期化する。
処理４０２では、輝度領域データ５−１７の配列［Ｙｂ］［Ｘｂ］の値と横エッジ領域データ５−１８の配列［Ｙｂ］［Ｘｂ］の値と縦エッジ領域データ５−１９の配列［Ｙｂ］［Ｘｂ］の値が共に闘値３（５−５）以上であるか否かを調べ、共に闘値３以上ならば処理４０３へ移り、そうでなければ処理４０４へ移る。
処理４０３では、字幕領域データ５−２０の配列［Ｙｂ］［Ｘｂ］に“１”を書き込む。“１”を書き込んだ配列に対応する領域が字幕有りの領域である。
処理４０４では、字幕領域データ５−２０の配列［Ｙｂ］［Ｘｂ］に“０”を書き込む。“０”を書き込んだ配列に対応する領域が字幕無しの領域である。
処理４０５〜処理４０９は、上記処理４０２〜処理４０４を全ての領域に対して行うためのアドレス更新処理である。上記処理４０２〜処理４０４を全ての領域に対して行って字幕領域データ５−２０を作成完了すると、図１３の処理４１０に移る。
【００４９】
図１３の処理４１０では、領域横位置カウンタＸｂおよび領域縦位置カウンタＹｂを“０”に初期化する。また、行カウントデータ５−２２を“０”に初期化する。
処理４１１では、行カウントデータ５−２２の配列［Ｙｂ］に字幕領域データの配列［Ｙｂ］［Ｘｂ］の内容を加算する。
処理４１２〜処理４１６は、上記処理４１１を全ての領域に対して行うためのアドレス更新処理である。上記処理４１１を全ての領域に対して行って行カウントデータ５−２２を作成完了すると、処理４１７に移る。
処理４１７では、領域横位置カウンタＸｂおよび領域縦位置カウンタＹｂを“０”に初期化する。又、列カウントデータ５−２５を“０”に初期化する。
処理４１８では、列カウントデータ５−２５の配列［Ｘｂ］に字幕領域データの配列［Ｙｂ］［Ｘｂ］の内容を加算する。
処理４１９〜処理４２３は、上記処理４１８を全ての領域に対して行うためのアドレス更新処理である。上記処理４１８を全ての領域に対して行って列カウントデータ５−２５を作成完了すると、図１４の処理４２４に移る。
【００５０】
図１４の処理４２４では、領域横位置カウンタＸｂおよび領域縦位置カウンタＹｂを“０”に初期化する。また、最大行カウントデータ５−２３および最大列カウントデータ５−２６を“０”に初期化する。
処理４２５では、行カウントデータ５−２２の配列［Ｙｂ］の値が最大行カウントデータ５−２３より大きいかを調べ、大きければ処理４２６へ移り、大きくなければ処理４２８に移る。
処理４２６では、行カウントデータ５−２２の配列［Ｙｂ］の値を最大行カウントデータ５−２３に複写する。
処理４２７では、最大行位置データ５−２４に“Ｙｂ”の値を記憶する。
処理４２８および処理４２９は、上記処理４２５〜処理４２７を全ての行に対して行うためのアドレス更新処理である。上記処理４２５〜処理４２７を全ての行に対して行って最大行カウントデータ５−２３および最大行位置データ５−２４を作成完了すると、処理４３０に移る。
処理４３０では、列カウントデータ５−２５の配列［Ｘｂ］の値が最大列カウントデータ５−２６より大きいかを調べ、大きければ処理４３１へ移り、大きくなければ処理４３３に移る。
処理４３１では、列カウントデータ５−２５の配列［Ｘｂ］の値を最大列カウントデータ５−２６に複写する。
処理４３２では、最大列位置データ５−２７に“Ｘｂ”の値を記憶する。
処理４３３および処理４３４は、上記処理４３０〜処理４３２を全ての列に対して行うためのアドレス更新処理である。上記処理４３０〜処理４３２を全ての列に対して行って最大列カウントデータ５−２６および最大列位置データ５−２７を作成完了すると、処理４３５に移る。
【００５１】
処理４３５では、最大行カウントデータ５−２３が閾値４（５−６）以上であるか又は最大列カウントデータ５−２６が閾値４以上であるか否かを調べる。最大行カウントデータ５−２３が閾値４以上であるか又は最大列カウントデータ５−２６が閾値４以上であれば、当該フレームの画像中に字幕有りと判定し、処理４３６へ移る。最大行カウントデータ５−２３が閾値４未満であり且つ最大列カウントデータ５−２６が閾値４未満であれば、当該フレームの画像中に字幕無しと判定し、図１７の処理４７１に移る。
処理４３６では、最大行カウントデータ５−２３が最大列カウントデータ５−２６以上であるか否かを調べる。最大行カウントデータ５−２３が最大列カウントデータ５−２６以上であれば、「字幕が横書きである」と判定し、処理４３７に移る。最大行カウントデータ５−２３が最大列カウントデータ５−２６以上でなければ、「字幕は縦書きである」と判定し、処理４４０に移る。
【００５２】
処理４３７では、最大行位置データ５−２４が“５”行目（画面の中段の行）以上であるかを調べ、“５”以上であれば「字幕は画面の上半分に横書き」と判断し、処理４３８へ移り、“５”未満であれば「字幕は下半分に横書き」と判断し、処理４３９へ移る。
処理４３８では、字幕付属データ５−２１に“上横書き”を書き込む。
処理４３９では、字幕付属データ５−２１に“下横書き”を書き込む。そして、図１５の処理４５１に移る。
【００５３】
一方、処理４４０では、最大列位置データ５−２７が“８”列目（画面の中央の列）以上であるかを調べ、“８”以上であれば「字幕は画面の右半分に縦書き」と判断し、処理４４１へ移り、“８”未満であれば「字幕は画面の左半分に縦書き」と判断し、処理４４２へ移る。
処理４４１では、字幕付属データ５−２１に“右縦書き”を書き込む。
処理４４２では、字幕付属データ５−２１に“左縦書き”を書き込む。そして、図１５の処理４５１に移る。
【００５４】
図１５の処理４５１では、領域横位置カウンタＸｂ及び領域縦位置カウンタＹｂを“０”に初期化する。又、領域一致数５−２９を“０”に初期化する。
処理４５２では、字幕領域データ５−２０の配列［Ｙｂ］［Ｘｂ］の値と前字幕領域データ５−２８の配列［Ｙｂ］［Ｘｂ］の値が一致するかどうかを調べ、一致すれば処理４５３へ移り、一致しなければ処理４５４へ移る。
処理４５３では、領域一致数５−２９に“１”を加える。
処理４５４から処理４５８は、上記処理４５２および処理４５３を全ての領域に対して行うためのアドレス更新処理である。上記処理４５２，処理４５３を全ての領域に対して行って領域一致数５−２９を作成完了すると、処理４５９に移る。
【００５５】
処理４５９では、領域一致数５−２９を領域数“１６０”で割って一致度を求め、その一致度が“０．７”未満か否かを調べる。一致度が“０．７”未満なら、字幕が変化したと判断し、処理５０１へ移る。一致度が“０．７”以上なら、字幕が変化していないと判断し、図１６の処理４６１へ移る。なお、本実施例では一致度の閾値を“０．７”としたが、任意に設定可能である。
処理５０１では、新たな代表画像構造体５−２を生成し、その代表画像構造体５−２の代表画像識別番号５−２−１に、前回生成した代表画像構造体５−２の代表画像識別番号５−２−１に“１”を加えた値を設定する。また、字幕開始時間５−２−５に現在時刻を格納し、字幕書式５−２−７に字幕付属データ５−２１を複写する。
処理５０２では、画素横位置カウンタＸおよび画素縦位置カウンタＹを“０”に初期化する。
処理５０３では、代表画像データ５−２−２の配列［Ｙ］［Ｘ］に緑画像データ５−７−２の配列［Ｙ＊２］［Ｘ＊２］の輝度値を複写する。
処理５０４〜処理５０８は、上記処理５０３を代表画像の全ての画素に対して行うためのアドレス更新処理である。上記処理５０３を代表画像の全ての画素に対して行って代表画像データ５−２−２を作成完了すると、図１６の処理４６１に移る。なお、代表画像データ５−２−２は、緑画像データ５−７−２の１／２縮小画像となる。
【００５６】
図１６の処理４６１では、領域横位置カウンタＸｂおよび領域縦位置カウンタＹｂを“０”に初期化する。
処理４６２では、前字幕領域データ５−２８の配列［Ｙｂ］［Ｘｂ］に字幕領域データ５−２０の配列［Ｙｂ］［Ｘｂ］の値を複写する。
処理４６３から処理４６７は、上記処理４６２を全ての領域に対して行うためのアドレス更新処理である。上記処理４６２を全ての領域に対して行って前字幕領域データ５−２８を更新完了すると、処理４６８に移る。
処理４６８では、代表画像構造体５−２の字幕終了時間５−２−６に現在時刻を格納する。そして、字幕判定部４００における処理を終了する。
【００５７】
一方、図１７の処理４７１では、領域横位置カウンタＸｂおよび領域縦位置カウンタＹｂを“０”に初期化する。
処理４７２では、前字幕領域データ５−２８の配列［Ｙｂ］［Ｘｂ］に“０”を格納する。
処理４７３から処理４７７は、上記処理４７２を全ての領域に対して行うためのアドレス更新処理である。上記処理４７２を全ての領域に対して行って前字幕領域データ５−２８を更新完了すると、字幕判定部４００における処理を終了する。
【００５８】
以上の動画像の代表画像抽出装置１０００によれば、特徴抽出部１５０によって、領域別に字幕が現われているかどうかを判定しているので、字幕の文字数が画面全体で少ない場合であっても、字幕を好適に検出可能である。また、特徴抽出部１５０は、字幕の特徴として高輝度の画素と強エッジの画素の両方をチェックしているので、ライト照明のようなエッジが無くかつ高輝度の背景や将棋盤のようにエッジは有るが輝度の低い背景は字幕と区別されるため、誤抽出を防止できる。また、字幕判定部４００によって、字幕有無の情報を行方向および列方向に投影して判断しているので、字幕が縦書きでも横書きでも対応可能であり、また、現われた字幕が縦書きか横書きであるかを区別可能である。さらに、代表画像作成部５００によって縮小した代表画像を作成し、表示部６００によって複数の縮小代表画像を一覧表示するため、代表画像の検索が容易になる。
【００５９】
【発明の効果】
本発明の字幕検出方法によれば、字幕の表示態様が任意である一般の画像に対して字幕が有るか否かを判定することが出来るようになる。
また、画像自体の変化が少なく，字幕のみが変化するような場合でも、必要な代表画像を抽出することが出来る。
【図面の簡単な説明】
【図１】本発明の一実施例の動画像の代表画像抽出装置のシステム構成図である。
【図２】ディスプレイ装置に表示する画面の例示図である。
【図３】代表画像抽出処理の機能ブロック図である。
【図４】メモリに記憶されるプログラムとデータの構成図である。
【図５】代表画像構造体の構成図である。
【図６】領域別輝度計数部における高輝度の画素を抽出する処理のフロー図である。
【図７】領域別輝度計数部における複数のフレームに渡り高輝度が継続している画素を抽出する処理のフロー図である。
【図８】領域別輝度計数部における領域別に高輝度の画素数を計数する処理のフロー図である。
【図９】領域別エッジ計数部における縦エッジおよび横エッジの画素を抽出する処理のフロー図である。
【図１０】領域別エッジ計数部における複数のフレームに渡り強エッジが継続している画素を抽出する処理のフロー図である。
【図１１】領域別エッジ計数部における領域ごとに縦エッジ数および横エッジ数を計数する処理のフロー図である。
【図１２】字幕判定部における領域ごとに字幕有無を判別する処理のフロー図である。
【図１３】字幕判定部における字幕有りの領域を行方向および列方向に投影する処理のフロー図である。
【図１４】字幕判定部における字幕有りの画像を判定する処理のフロー図である。
【図１５】字幕判定部における字幕有りの画像の連続性を判定する処理のフロー図である。
【図１６】字幕判定部における字幕有りの画像の連続性を判定する処理の続きのフロー図である。
【図１７】字幕判定部における字幕無しの画像についての処理のフロー図である。
【図１８】複数の領域に区分した画面の説明図である。
【符号の説明】
１…ディスプレィ装置、２…スピーカ、３…コンピュータ、４…ＣＰＵ、
５…メモリ、６…インタフェース、７…ポインティングデバイス、
８…キーボード、９…ビデオ再生装置、１０…制御信号、
１１…ビデオ入力装置、１２…ディジタル画像データ、
１３…外部情報記憶装置、
１００…動画入力部、１５０…特徴抽出部、２００…領域別輝度計数部、
３００…領域別エッジ計数部、４００…字幕判定部、
５００…代表画像作成部、６００…表示部、
１０００…動画像の代表画像抽出装置。[0001]
[Industrial applications]
The present invention relates to a method for detecting subtitles. To the law More specifically, a subtitle detection method for determining whether or not subtitles are present in an image. To the law Related.
[0002]
[Prior art]
There is the following conventional technique for the caption detection method.
JP-A-5-137066 discloses a technique for extracting an edge component of a video signal to identify a subtitle portion and a background portion in a karaoke video.
Also, in the "Content Identification Method Based on Recognition from Sumo Matching, Proceedings of the 44th Annual Conference of the IPSJ, 2-301", the screen is divided into a left part and a right part, and the left part is written vertically. There is disclosed a technique for recognizing an opponent from subtitles that are displayed and subtitles that are written vertically on the right part.
[0003]
As a representative image extracting device for a moving image, there is the following conventional technology.
Japanese Patent Application Laid-Open No. Hei 5-244475 proposes a technique in which a change point of an image is obtained based on a difference between frames, and an image giving the change point is extracted as a representative image.
[0004]
As other related prior arts, there are techniques disclosed in Japanese Patent Application Laid-Open Nos. 3-273363 and 3-292572.
[0005]
[Problems to be solved by the invention]
The caption detection method disclosed in Japanese Patent Application Laid-Open No. Hei 5-137066 is based on the premise that captions are written horizontally, and cannot be used for captions written vertically. In other words, there is a problem that it can handle karaoke videos but cannot handle general images.
In addition, the conventional technology disclosed in the above-mentioned "Content identification method based on recognition from sumo wrestling match, Proceedings of the 44th Annual Conference of IPSJ, 2-301" has subtitles on the left and right portions of the screen, respectively. It is premised that it is written, and there is still a problem that cannot be dealt with general images.
Therefore, a first object of the present invention is to provide a subtitle detection method capable of determining whether or not a general image whose subtitle display mode is arbitrary has a subtitle.
[0006]
In the moving image representative image extracting apparatus disclosed in Japanese Patent Application Laid-Open No. 5-244475, the representative image is extracted by paying attention only to the change in the image. There is a problem that a typical image cannot be extracted. For example, in the case of an image in which an announcer reads a plurality of news one after another, the representative image may not be able to be extracted for each news because the image itself changes little and only the subtitle changes.
Therefore, a second object of the present invention is to Detect subtitles, and based on the results Representative images can be extracted Caption detection method Is to provide.
[0007]
[Means for Solving the Problems]
[0008]
No. 1 In terms of the present invention, the present invention, the image is divided into a plurality of regions, counting the number of high-luminance pixels equal to or greater than the first threshold and the number of edges having a difference in luminance value equal to or greater than the second threshold for each region, The area where the number of pixels is equal to or greater than a third threshold and the number of edges is equal to or greater than the third threshold is determined as an area with captions, and the number of areas with captions is projected in the row direction and the column direction. When the maximum value of the number of areas with subtitles when projected or the maximum value of the number of areas with subtitles when projected in the column direction is equal to or greater than a fourth threshold, it is determined that subtitles are present in the image. A subtitle detection method is provided.
[0009]
No. 2 In view of the above, the present invention provides a subtitle detection method having the above configuration, wherein the number of high-luminance pixels and the number of edges existing in the same place at least continuously for at least two past frames are counted. I do.
[0010]
No. 3 In the aspect of the present invention, in the caption detection method having the above configuration, the horizontal luminance difference counts edges having a second threshold or more, and the vertical luminance difference counts edges having a second threshold or more. A featured subtitle detection method is provided.
[0011]
No. 4 In the aspect of the present invention, in the caption detection method having the above configuration, the maximum value of the number of subtitled areas when projected in the row direction is larger than the maximum value of the number of subtitled areas when projected in the column direction. In such a case, a subtitle detection method is provided in which it is determined that the subtitle is horizontally written, and otherwise, it is determined that the subtitle is vertically written.
[0012]
No. According to a fifth aspect, the present invention provides the caption detection method having the above configuration, Select a representative image from the images determined to have subtitles Caption detection method characterized by the following: I will provide a.
[0014]
No. 6 In terms of the present invention, the present invention In the caption detection method, When an image determined to have subtitles is a temporally continuous frame, only one of the frames is selected as a representative image. Caption detection method I will provide a.
[0015]
No. 7 In terms of the present invention, the present invention Caption detection method In, each extracted representative image is reduced and displayed side by side on the screen Caption detection method characterized by the following: I will provide a.
[0016]
[Action]
In the caption detection method according to the first aspect, the image is divided into a plurality of regions, the feature amount of the caption is calculated for each region, and it is determined whether or not each region is a region with a caption based on the feature amount. Then, the number of areas with captions is projected in the row direction and the column direction, and it is determined whether or not captions are present in the image based on the projection results.
According to this, since the presence / absence of subtitles is determined for each of the divided areas, subtitles can be detected even when the number of subtitles is small in the entire screen. In addition, since the number of subtitled areas is projected in the row and column directions, and whether or not subtitles are present in the image is determined based on the projection result, subtitles can be written in either horizontal or vertical writing. There is no restriction on the display position of. Therefore, it is possible to determine whether or not there is a caption for a general image whose caption display mode is arbitrary.
[0017]
Further, the first In the caption detection method according to the viewpoint, the image is divided into a plurality of regions, and the number of high-luminance pixels equal to or greater than the first threshold and the number of edges having a difference in luminance value equal to or greater than the second threshold are counted for each region, An area in which the number of pixels is equal to or greater than a third threshold and in which the number of edges is equal to or greater than the third threshold is determined as an area having captions. Then, the number of subtitled areas is projected in the row and column directions, and the maximum value of the number of subtitled areas when projected in the row direction or the maximum value of subtitled areas when projected in the column direction is the second value. When the number is equal to or larger than the fourth threshold, it is determined that there is a caption in the image.
According to this, Of In addition to the function, the number of pixels with high luminance is counted, so that characters composed of pixels with higher luminance than the background can be suitably discriminated. Further, since the number of strong edges is counted, a character having a higher frequency of appearance of an edge than the background can be suitably determined. Then, since it is determined whether or not a subtitle exists in the area based on both the number of high-luminance pixels and the number of strong edges, it is possible to determine with high accuracy.
[0018]
The above 2 In the caption detection method according to the aspect described above, the number of high-luminance pixels and the number of edges existing at the same place at least continuously for at least two past frames are counted.
In a moving image, the background pixels are likely to change, but the subtitles are displayed without changing for a certain period of time until the viewer finishes reading. Therefore, by comparing with a past frame, a pixel or an edge relating to a caption can be detected with high accuracy.
[0019]
The above 3 In the caption detection method according to the aspect described above, edges whose luminance difference in the horizontal direction is equal to or greater than a second threshold value and edges whose luminance difference in the vertical direction is equal to or greater than the second threshold value are counted.
For example, in a background such as a window blind, edges appear frequently. However, since only one of the horizontal edge and the vertical edge appears, by considering both, background edges such as window blinds are not counted, and erroneous determination can be prevented.
[0020]
The above 4 In the caption detection method according to the above aspect, if the maximum value of the number of subtitled areas when projected in the row direction is larger than the maximum value of the number of subtitled areas when projected in the column direction, the caption is written horizontally. Otherwise, it is determined that the caption is in vertical writing.
This makes it possible to detect a subtitle format.
[0021]
The above 5 In terms of With caption detection, A representative image is selected from the images determined to have subtitles.
As described above, since the image having the caption is detected and the representative image is extracted from the image, the representative image can be appropriately extracted even from a moving image in which the image itself has little change and only the caption changes.
[0023]
The above 6 In terms of With caption detection, When images determined to have subtitles are temporally continuous, only one frame image is selected as a representative image.
As a result, for example, an image in place of the caption can be extracted.
[0024]
The above 7 In terms of With caption detection, Each extracted representative image is reduced and displayed on the screen.
As a result, a plurality of representative images can be listed, and the user can easily find a desired scene.
[0025]
【Example】
Hereinafter, the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to this.
[0026]
FIG. Implement caption detection method 1 is a system configuration diagram of a representative image extraction device for a moving image.
In the moving image representative image extracting device 1000, the video reproducing device 9 is a device such as an optical disk or a video deck for reproducing a moving image. Each frame of the moving image handled by the video playback device 9 is assigned a frame number in order from the beginning of the moving image, and the frame number is transmitted from the computer 3 to the video playback device by the control signal 10 so that the corresponding frame is assigned. Is reproduced, and the video signal V is output to the video input device 11.
The video input device 11 converts the video signal V into digital image data 12 and sends it to the computer 3.
[0027]
The computer 3 captures the digital image data 12 via the interface 6 and processes the digital image data 12 with the CPU 4 according to a program stored in the memory 5. Various data are stored in the memory 5 and are referred to as needed. Further, various kinds of information are stored in the external storage device 13 as necessary for processing.
Commands to the computer 3 can be issued by using a pointing device 7 such as a mouse or a keyboard 8.
A display device 1 such as a CRT displays an output screen of the computer 3, and a speaker 2 generates an output sound of the computer 3.
[0028]
FIG. 2 is an example of a screen displayed on the display device 1.
In the area 50, a moving image based on the digital image data 12 is displayed.
In an area 60, buttons for controlling the present system and the operation status of the present system are displayed. The start button 61 is a button for starting execution of the representative image extraction process. The stop button 62 is a button for stopping the execution of the representative image extraction processing. The operation of pressing the button is performed when the user operates the pointing device 7 to position the cursor 80 on the button and clicks the button. The detection screen number display 63 is the number of representative images extracted from the start of execution to the present. The start time display 64 is the execution start time of the representative image extraction process.
[0029]
In the area 70, the extracted m representative images are reduced and displayed (in FIG. 2, m = 6). That is, when a subtitle exists in a frame of a moving image, the image of the frame is extracted as a representative image, reduced to an appropriate size, and displayed in the area 70. Also, the extraction time of the representative image is displayed together. When the extracted representative image exceeds the displayable number m of the area 70, the display is automatically scrolled and only the latest m representative images are displayed. The user can press the scroll buttons 71 and 73 or drag the scroll bar 72 to display the scrolled-out representative image.
[0030]
FIG. 3 is a functional block diagram of the representative image extracting process.
The moving image input unit 100 loads the digital image data 12 into the memory 5 and displays the moving image on the area 50 of the display device 1.
The region-based luminance counting unit 200 of the feature extraction unit 150 detects high-luminance pixels equal to or greater than a first threshold in each region when the screen of each frame of the moving image is divided into a plurality of regions, and determines the number of pixels. Is output.
The region-based edge counting unit 300 of the feature extraction unit 150 detects edges having a second threshold or more in each region when the screen of each frame of the moving image is divided into a plurality of regions, and outputs the number of edges. .
The subtitle determination unit 400 determines an area in which the number of pixels and the number of edges are equal to or greater than the third threshold value as an area with subtitles, projects the number of subtitled areas in the row direction and the column direction, and projects in the row direction. When the maximum value of the number of regions with subtitles at that time or the maximum value of the number of regions with subtitles when projected in the column direction is equal to or greater than the fourth threshold value, it is determined that there is a subtitle in the image of the frame.
The representative image creation unit 500 reduces the image of the frame determined to have the subtitle and stores it in the memory 5 as a representative image.
The display unit 600 displays the plurality of reduced representative images and the extraction time in the area 70 of the display device 1 side by side.
[0031]
FIG. 4 is a configuration diagram of programs and data stored in the memory 5.
The program 5-1 is a program for a representative image extraction process. This program 5-1 refers to the following data 5-2 to data 5-27.
[0032]
The representative image structure 5-2 is a structure that stores a representative image and attached data (such as extraction time) (details are shown in FIG. 5). This representative image structure 5-2 is data to be accumulated as an extraction result.
[0033]
The threshold value 1 (5-3) is a first threshold value for detecting a high-luminance pixel.
The threshold value 2 (5-4) is a second threshold value for detecting a strong edge.
The threshold value 3 (5-5) is a third threshold value for determining a segmented area with subtitles.
The threshold value 4 (5-6) is a fourth threshold value for detecting a frame having a caption.
The threshold value 1 (5-3), the threshold value 2 (5-4), the threshold value 3 (5-5), and the threshold value 4 (5-6) are data set in advance.
[0034]
The following data 5-7 to data 5-27 are work data used for one process.
The image data 5-7 is digital image data of the current frame to be processed, and is [240] × [320] (= number of screen pixels: see FIG. 18) array data. Each array is composed of three types of color component data of red image data 5-7-1, green image data 5-7-2, and blue image data 5-7-3.
The luminance data 5-8 is [240] × [320] array data indicating the detection result of the high luminance pixels.
The horizontal edge data 5-9 is [240] × [320] array data indicating a detection result of a pixel having a large luminance difference in the horizontal direction of the screen (a pixel of a strong edge).
The vertical edge data 5-10 is [240] × [320] array data indicating a detection result of a pixel having a large luminance difference in the vertical direction of the screen (a pixel of a strong edge).
[0035]
The previous frame luminance data 5-11 is luminance data (5-8) of the previous frame of the current processing target frame.
The previous frame horizontal edge data 5-12 is horizontal edge data (5-9) of the previous frame of the current frame to be processed.
The previous frame vertical edge data 5-13 is the vertical edge data (5-10) of the previous frame of the current frame to be processed.
[0036]
The luminance collation data 5-14 is [240] × [320] array data in which both the luminance data 5-8 and the previous frame luminance data 5-11 store high luminance pixels.
The horizontal edge collation data 5-15 is [240] × [320] array data in which both the horizontal edge data 5-9 and the previous frame horizontal edge data 5-12 store pixels of strong edges.
The vertical edge collation data 5-16 is [240] × [320] array data in which both the vertical edge data 5-10 and the previous frame vertical edge data 5-13 store pixels of strong edges.
[0037]
The luminance area data 5-17 is array data storing the result of counting the number of high luminance pixels of the luminance collation data 5-14 for each area. This is [10] × [16] (= number of areas: see FIG. 18) array data. In the present embodiment, the screen is divided into [10] × [16] areas, but it is preferable to divide the screen into a size such that one subtitle character is included in one area.
The horizontal edge area data 5-18 is [10] × [16] array data that stores the result of counting the number of pixels (edge number) of strong edges of the horizontal edge collation data 5-15 for each area. .
The vertical edge area data 5-19 is [10] × [16] array data that stores the result of counting the number of pixels (edge number) of strong edges of the vertical edge collation data 5-16 for each area. .
The luminance data 5-8 to the vertical edge area data 5-19 are data created by the feature extracting unit 150.
[0038]
The subtitle area data 5-20 is [10] × [16] pieces of array data that stores the determination result of the presence or absence of subtitles for each area.
The subtitle attached data 5-21 is data on the position and direction of the subtitle when there is a subtitle.
The row count data 5-22 is [10] pieces of array data in which the number of areas with captions is stored for each row.
The maximum row count data 5-23 is data storing the maximum value of the array data of the row count data 5-22.
The maximum row position data 5-24 is data that stores the row number of the row corresponding to the maximum value in the array data of the row count data 5-22.
The column count data 5-25 is [16] pieces of array data in which the number of areas with captions is stored for each column.
The maximum column count data 5-26 is data storing the maximum value of the array data of the column count data 5-25.
The maximum column position data 5-27 is data storing the column number of the column corresponding to the maximum value in the array data of the column count data 5-25.
The previous caption area data 5-28 is caption area data (5-20) of the previous frame of the current frame to be processed.
The area match number 5-29 is the number of areas where the presence or absence of subtitles matches between the current frame to be processed and the previous frame.
The area matching number 5-29 from the caption area data 5-20 is data created by the caption determination unit 400.
[0039]
FIG. 5 is a configuration diagram of the representative image structure 5-2.
The representative image identification number 5-2-1 is the order of the extracted representative images.
The representative image data 5-2-2 is array data obtained by reducing the extracted image. This is [120] × [160] (= １／ the number of pixels on the screen) array data. Each array is composed of three types of color component data of red image data, green image data, and blue image data.
The representative image display position X (5-2-3) and the representative image display position Y (5-2-4) are X and Y coordinate positions when the representative image is displayed in the area 70.
The subtitle start time 5-2-5 is a time at which a subtitle relating to the representative image appears.
The subtitle end time 5-2-6 is the time at which the subtitle relating to the representative image has disappeared.
The caption format 5-2-7 is data on the display direction and position of the caption for the representative image.
[0040]
6, 7, and 8 are flowcharts showing the processing procedure in the region-by-region luminance counting section 200.
In the process 201 of FIG. 6, the pixel horizontal position counter X and the pixel vertical position counter Y are initialized to “0”.
In the process 202, the luminance value of the array [Y] [X] of the red image data 5-7-1, the green image data 5-7-2, and the blue image data 5-7-3 is a threshold value 1 (5-3). It is checked whether or not the above is the case. If all three colors have a luminance value equal to or greater than the threshold value 1, the process proceeds to step 203;
In the process 203, “1” is written in the array [Y] [X] of the luminance data 5-8.
In step 204, “0” is written to the array [Y] [X] of the luminance data 5-8.
Processes 205 to 209 are address update processes for performing the processes 202 to 204 on all the pixels. When the processes 202 to 204 are performed on all the pixels to complete the creation of the luminance data 5-8, the process proceeds to a process 210 in FIG.
[0041]
In the process 210 of FIG. 7, the pixel horizontal position counter X and the pixel vertical position counter Y are initialized to “0”.
In the process 211, it is checked whether both the value of the array [Y] [X] of the luminance data 5-8 and the value of the array [Y] [X] of the previous frame luminance data 5-11 are "1". If both are "1", the process proceeds to step 212; otherwise, the process proceeds to step 213.
In the process 212, “1” is written to the array [Y] [X] of the luminance collation data 5-14.
In the process 213, “0” is written to the array [Y] [X] of the luminance collation data 5-14.
Processes 214 to 218 are address update processes for performing the processes 211 to 213 on all pixels. When the above steps 202 to 204 are performed on all the pixels to complete the creation of the luminance collation data 5-14, the process proceeds to step 219.
[0042]
In process 219, the pixel horizontal position counter X and the pixel vertical position counter Y are initialized to “0”.
In the process 220, the contents of the array [Y] [X] of the luminance data 5-8 are copied to the array [Y] [X] of the previous frame luminance data 5-11.
Processes 221 to 225 are address update processes for performing the process 220 for all pixels. When the above process 220 is performed on all the pixels to complete the update of the previous frame luminance data 5-11, the process proceeds to a process 226 in FIG.
[0043]
In the process 226 in FIG. 8, the in-region pixel horizontal position counter i, the in-region pixel vertical position counter j, the region horizontal position counter Xb, and the region vertical position counter Yb are initialized to “0”. Also, the luminance area data 5-17 is initialized to “0”.
In the process 227, it is checked whether or not the content of the array [Yb * 24 + j] [Xb * 20 + i] of the luminance collation data 5-14 is "1". If "1", the process proceeds to the process 228; Move to
In the process 228, “1” is added to the array [Yb] [Xb] of the luminance area data 5-17.
Processes 229 to 239 are address update processes for performing the processes 227 and 228 for all pixels. When the processing 227 and the processing 228 are performed for all the pixels to complete the creation of the brightness area data 5-17, the processing in the area-by-area brightness counting unit 200 ends.
[0044]
FIGS. 9, 10, and 11 are flowcharts showing the processing procedure in the edge counting unit 300 for each area.
In the process 301 of FIG. 9, the pixel horizontal position counter X and the pixel vertical position counter Y are initialized to “1”.
In the process 302, the luminance value of the array [Y] [X + 1] and the array [Y] [X-1] of the red image data 5-7-1, the green image data 5-7-2, and the blue image data 5-7-3 It is checked whether or not the difference between the brightness values is equal to or greater than the threshold value 2 (5-4). If the difference between the brightness values for all three colors is equal to or greater than the threshold value 2, the process proceeds to step 303; Move to processing 304.
In the process 303, “1” is written in the array [Y] [X] of the horizontal edge data 5-9 (FIG. 4).
In the process 304, “0” is written to the array [Y] [X] of the horizontal edge data 5-9 (FIG. 4).
In the process 305, the luminance value of the array [Y + 1] [X] and the array [Y-1] [X of the red image data 5-7-1, the green image data 5-7-2, and the blue image data 5-7-3 It is checked whether or not the difference between the brightness values is equal to or greater than the threshold value 2 (5-4). If the difference between the brightness values for all three colors is equal to or greater than the threshold value 2, the process proceeds to processing 306; Move to processing 307.
In the process 306, “1” is written in the array [Y] [X] of the vertical edge data 5-10 (FIG. 4).
In the process 307, “0” is written in the array [Y] [X] of the vertical edge data 5-10 (FIG. 4).
Processes 308 to 312 are address update processes for performing the processes 302 to 307 for all pixels. When the processes 202 to 204 are performed on all the pixels except for the pixels at the edges of the screen to complete the creation of the horizontal edge data 5-9 and the vertical edge data 5-10, the process proceeds to the process 313 in FIG.
[0045]
In the process 313 of FIG. 10, the pixel horizontal position counter X and the pixel vertical position counter Y are initialized to “0”.
In the process 314, it is checked whether both the value of the array [Y] [X] of the horizontal edge data 5-9 and the value of the array [Y] [X] of the previous frame horizontal edge data 5-12 are "1". If both are "1", the process proceeds to step 315; otherwise, the process proceeds to step 316.
In the process 315, “1” is written into the array [Y] [X] of the horizontal edge collation data 5-15.
In the process 316, “0” is written in the array [Y] [X] of the horizontal edge collation data 5-15.
In the process 317, it is determined whether the value of the array [Y] [X] of the vertical edge data 5-10 and the value of the array [Y] [X] of the previous frame vertical edge data 5-13 are both "1". The process proceeds to step 318 if both are “1”, and to step 319 otherwise.
In the process 318, “1” is written into the array [Y] [X] of the vertical edge collation data 5-16.
In process 319, “0” is written to the array [Y] [X] of the vertical edge collation data 5-16.
Processes 320 to 324 are address update processes for performing the processes 314 to 319 for all pixels. When the processes 314 to 319 are performed on all the pixels to complete the creation of the horizontal edge collation data 5-15 and the vertical edge collation data 5-16, the process proceeds to a process 325.
[0046]
In the process 325, the contents of the array [Y] [X] of the horizontal edge data 5-9 are copied to the array [Y] [X] of the previous frame horizontal edge data 5-12. The contents of the array [Y] [X] of the vertical edge data 5-10 are copied to the array [Y] [X] of the previous frame vertical edge data 5-13.
Processes 327 to 331 are address update processes for performing the process 326 for all pixels. When the process 326 is performed on all the pixels to update the previous frame horizontal edge data 5-12 and the previous frame vertical edge data 5-13, the process proceeds to the process 332 in FIG.
[0047]
In the process 332 in FIG. 11, the in-region pixel horizontal position counter i, the in-region pixel vertical position counter j, the region horizontal position counter Xb, and the region vertical position counter Yb are initialized to “0”. Further, the horizontal edge area data 5-18 and the vertical edge area data 5-19 are initialized to "0".
In the process 333, it is checked whether or not the content of the array [Yb * 24 + j] [Xb * 20 + i] of the horizontal edge collation data 5-15 is "1". If "1", the process proceeds to the process 334; Move on to 335.
In the process 334, “1” is added to the array [Yb] [Xb] of the horizontal edge area data 5-18.
In the process 335, it is checked whether or not the contents of the array [Yb * 24 + j] [Xb * 20 + i] of the vertical edge collation data 5-16 are "1". Move on to 337.
In the process 336, “1” is added to the array [Yb] [Xb] of the vertical edge area data 5-19.
Processes 337 to 348 are address update processes for performing the processes 333 to 336 for all pixels. When the processes 333 to 336 are performed on all the pixels to complete the creation of the horizontal edge region data 5-18 and the vertical edge region data 5-19, the process in the region-specific edge counting unit 300 ends.
[0048]
12, 13, and 14 are flowcharts illustrating processing procedures in the subtitle determination unit 400 and the representative image creation unit 500. Note that the processing of the caption determination unit 400 is indicated by reference number 4xx, and the processing of the representative image creation unit 500 is indicated by reference number 5xx.
In the process 401 of FIG. 12, the area horizontal position counter Xb and the area vertical position counter Yb are initialized to “0”.
In the process 402, the value of the array [Yb] [Xb] of the luminance area data 5-17, the value of the array [Yb] [Xb] of the horizontal edge area data 5-18, and the array [Yb] of the vertical edge area data 5-19 It is checked whether the values of [Xb] are both equal to or greater than the threshold value 3 (5-5). If both are equal to or greater than the threshold value 3, the process proceeds to step 403; otherwise, the process proceeds to step 404.
In the process 403, “1” is written to the array [Yb] [Xb] of the caption area data 5-20. The area corresponding to the array in which “1” is written is the area with subtitles.
In the process 404, “0” is written to the array [Yb] [Xb] of the subtitle area data 5-20. The area corresponding to the array in which “0” is written is the area without subtitles.
Processes 405 to 409 are address update processes for performing the processes 402 to 404 for all areas. When the processes 402 to 404 are performed on all the regions to complete the creation of the subtitle region data 5-20, the process proceeds to a process 410 in FIG.
[0049]
In processing 410 of FIG. 13, the area horizontal position counter Xb and the area vertical position counter Yb are initialized to “0”. Further, the row count data 5-22 is initialized to “0”.
In the process 411, the contents of the subtitle area data array [Yb] [Xb] are added to the array [Yb] of the row count data 5-22.
Processes 412 to 416 are address update processes for performing the process 411 for all areas. When the process 411 is performed on all the areas to complete the creation of the row count data 5-22, the process proceeds to a process 417.
In process 417, the area horizontal position counter Xb and the area vertical position counter Yb are initialized to “0”. Also, the column count data 5-25 is initialized to "0".
In the process 418, the contents of the subtitle area data array [Yb] [Xb] are added to the array [Xb] of the column count data 5-25.
Processes 419 to 423 are address update processes for performing the process 418 for all areas. When the process 418 is performed on all the areas to complete the creation of the column count data 5-25, the process proceeds to a process 424 in FIG.
[0050]
In the process 424 in FIG. 14, the area horizontal position counter Xb and the area vertical position counter Yb are initialized to “0”. Further, the maximum row count data 5-23 and the maximum column count data 5-26 are initialized to “0”.
In the process 425, it is checked whether or not the value of the array [Yb] of the row count data 5-22 is larger than the maximum row count data 5-23. If the value is larger, the process proceeds to a process 426;
In the process 426, the value of the array [Yb] of the row count data 5-22 is copied to the maximum row count data 5-23.
In the process 427, the value of “Yb” is stored in the maximum row position data 5-24.
Processes 428 and 429 are address update processes for performing the processes 425 to 427 for all rows. When the processes 425 to 427 are performed on all the rows to complete the creation of the maximum row count data 5-23 and the maximum row position data 5-24, the process proceeds to the process 430.
In the process 430, it is checked whether or not the value of the array [Xb] of the column count data 5-25 is larger than the maximum column count data 5-26. If the value is larger, the process proceeds to a process 431;
In the process 431, the value of the array [Xb] of the column count data 5-25 is copied to the maximum column count data 5-26.
In the process 432, the value of “Xb” is stored in the maximum column position data 5-27.
Processes 433 and 434 are address update processes for performing the processes 430 to 432 for all columns. When the processes 430 to 432 are performed on all the columns to create the maximum column count data 5-26 and the maximum column position data 5-27, the process proceeds to a process 435.
[0051]
In the process 435, it is checked whether or not the maximum row count data 5-23 is equal to or larger than the threshold value 4 (5-6) or the maximum column count data 5-26 is equal to or larger than the threshold value 4. If the maximum row count data 5-23 is greater than or equal to the threshold value 4 or the maximum column count data 5-26 is greater than or equal to the threshold value 4, it is determined that there is a caption in the image of the frame, and the process proceeds to step 436. If the maximum row count data 5-23 is less than the threshold value 4 and the maximum column count data 5-26 is less than the threshold value 4, it is determined that there is no caption in the image of the frame, and the process proceeds to the process 471 in FIG.
In the process 436, it is checked whether or not the maximum row count data 5-23 is greater than or equal to the maximum column count data 5-26. If the maximum row count data 5-23 is equal to or larger than the maximum column count data 5-26, it is determined that "subtitles are written horizontally", and the process proceeds to processing 437. If the maximum row count data 5-23 is not equal to or greater than the maximum column count data 5-26, it is determined that "subtitles are written vertically", and the process proceeds to 440.
[0052]
In the process 437, it is checked whether or not the maximum line position data 5-24 is equal to or greater than the "5" th line (middle line of the screen). Then, the processing shifts to processing 438. If it is less than “5”, it is determined that “subtitles are written horizontally in the lower half”, and the processing shifts to processing 439.
In the process 438, “Upper horizontal writing” is written in the subtitle attached data 5-21.
In the process 439, "lower horizontal writing" is written in the subtitle attached data 5-21. Then, the process proceeds to the process 451 in FIG.
[0053]
On the other hand, in the process 440, it is checked whether or not the maximum column position data 5-27 is equal to or larger than the "8" th column (the center column of the screen). And proceeds to processing 441. If it is less than “8”, it is determined that “subtitles are written vertically in the left half of the screen”, and processing proceeds to processing 442.
In the process 441, “right vertical writing” is written in the subtitle attached data 5-21.
In the process 442, "left vertical writing" is written in the subtitle attached data 5-21. Then, the process proceeds to the process 451 in FIG.
[0054]
In the process 451 of FIG. 15, the area horizontal position counter Xb and the area vertical position counter Yb are initialized to “0”. Also, the number of area matches 5-29 is initialized to "0".
In the process 452, it is checked whether or not the value of the array [Yb] [Xb] of the subtitle region data 5-20 matches the value of the array [Yb] [Xb] of the previous subtitle region data 5-28. The process moves to 453, and if they do not match, the process moves to 454.
In the process 453, “1” is added to the area matching number 5-29.
Processes 454 to 458 are address update processes for performing the processes 452 and 453 for all areas. When the processing 452 and the processing 453 are performed on all the areas to complete the creation of the area matching number 5-29, the processing shifts to the processing 459.
[0055]
In the process 459, the matching degree is obtained by dividing the area matching number 5-29 by the area number "160", and it is checked whether the matching degree is less than "0.7". If the degree of coincidence is less than “0.7”, it is determined that the caption has changed, and the process proceeds to processing 501. If the degree of coincidence is equal to or more than "0.7", it is determined that the subtitle has not changed, and the routine goes to processing 461 in FIG. In the present embodiment, the threshold value of the degree of coincidence is set to “0.7”, but can be set arbitrarily.
In processing 501, a new representative image structure 5-2 is generated, and the representative image identification number 5-2-1 of the representative image structure 5-2 is added to the representative image of the previously generated representative image structure 5-2. A value obtained by adding “1” to the identification number 5-2-1 is set. Also, the current time is stored in the subtitle start time 5-2-5, and the subtitle attached data 5-21 is copied to the subtitle format 5-2-7.
In process 502, a pixel horizontal position counter X and a pixel vertical position counter Y are initialized to “0”.
In process 503, the luminance value of the array [Y * 2] [X * 2] of the green image data 5-7-2 is copied to the array [Y] [X] of the representative image data 5-2-2.
Processes 504 to 508 are address update processes for performing the process 503 on all the pixels of the representative image. When the process 503 is performed on all the pixels of the representative image to create the representative image data 5-2-2, the process proceeds to the process 461 in FIG. Note that the representative image data 5-2-2 is a 1/2 reduced image of the green image data 5-7-2.
[0056]
In the process 461 of FIG. 16, the area horizontal position counter Xb and the area vertical position counter Yb are initialized to “0”.
In the process 462, the value of the array [Yb] [Xb] of the subtitle area data 5-20 is copied to the array [Yb] [Xb] of the previous subtitle area data 5-28.
Processes 463 to 467 are address update processes for performing the process 462 for all areas. When the above processing 462 is performed for all the areas and the update of the previous subtitle area data 5-28 is completed, the processing moves to processing 468.
In the process 468, the current time is stored in the subtitle end time 5-2-6 of the representative image structure 5-2. Then, the processing in subtitle determination section 400 ends.
[0057]
On the other hand, in a process 471 of FIG. 17, the area horizontal position counter Xb and the area vertical position counter Yb are initialized to “0”.
In the process 472, “0” is stored in the array [Yb] [Xb] of the previous caption area data 5-28.
Processes 473 to 477 are address update processes for performing the process 472 for all areas. When the above-described process 472 is performed on all the regions and the update of the previous subtitle region data 5-28 is completed, the process in the subtitle determination unit 400 ends.
[0058]
According to the above-described representative image extracting apparatus 1000 for a moving image, the feature extracting unit 150 determines whether or not subtitles appear in each region. Can be suitably detected. In addition, since the feature extraction unit 150 checks both high-brightness pixels and strong-edge pixels as subtitle features, there is no edge such as light illumination and an edge such as a high-brightness background or shogi board. A background with a low brightness is distinguished from a caption, so that erroneous extraction can be prevented. In addition, since the subtitle determination unit 400 determines whether or not the subtitles are projected vertically or horizontally, the information about the presence or absence of the subtitles is projected in the row direction and the column direction. Can be distinguished. Furthermore, since the reduced representative image is created by the representative image creation unit 500 and a plurality of reduced representative images are displayed as a list on the display unit 600, the search for the representative image is facilitated.
[0059]
【The invention's effect】
ADVANTAGE OF THE INVENTION According to the caption detection method of this invention, it becomes possible to determine whether a caption exists with respect to the general image in which the display mode of a caption is arbitrary.
Also, Even in the case where the image itself changes little and only the subtitle changes, a necessary representative image can be extracted.
[Brief description of the drawings]
FIG. 1 is a system configuration diagram of an apparatus for extracting a representative image of a moving image according to an embodiment of the present invention.
FIG. 2 is an exemplary view of a screen displayed on a display device.
FIG. 3 is a functional block diagram of a representative image extraction process.
FIG. 4 is a configuration diagram of programs and data stored in a memory.
FIG. 5 is a configuration diagram of a representative image structure.
FIG. 6 is a flowchart of a process of extracting a high-luminance pixel in a region-specific luminance counting unit.
FIG. 7 is a flowchart of a process of extracting pixels in which high luminance continues over a plurality of frames in a region-specific luminance counting unit.
FIG. 8 is a flowchart of a process of counting the number of high-luminance pixels for each area in an area-by-area brightness counting unit.
FIG. 9 is a flowchart of a process of extracting pixels of a vertical edge and a horizontal edge in a region-based edge counting unit.
FIG. 10 is a flowchart of a process of extracting pixels in which a strong edge continues over a plurality of frames in the region-based edge counting unit.
FIG. 11 is a flowchart of a process of counting the number of vertical edges and the number of horizontal edges for each area in an area-by-area edge counting unit.
FIG. 12 is a flowchart of a process of determining the presence or absence of subtitles for each area in a subtitle determination unit.
FIG. 13 is a flowchart of a process of projecting an area having a caption in a row direction and a column direction in a caption determination unit.
FIG. 14 is a flowchart of a process of determining an image with a subtitle in a subtitle determination unit.
FIG. 15 is a flowchart illustrating a process of determining the continuity of an image having a caption in a caption determining unit.
FIG. 16 is a flowchart illustrating a continuation of a process of determining the continuity of an image having a caption in the caption determining unit;
FIG. 17 is a flowchart of a process for an image without subtitles in a subtitle determination unit.
FIG. 18 is an explanatory diagram of a screen divided into a plurality of areas.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Display apparatus, 2 ... Speaker, 3 ... Computer, 4 ... CPU,
5 memory, 6 interface, 7 pointing device,
8 keyboard, 9 video player, 10 control signal,
11 video input device, 12 digital image data,
13 ... External information storage device,
100: moving image input unit, 150: feature extraction unit, 200: region-specific luminance counting unit,
300: an edge counting unit for each area; 400: a subtitle determination unit;
500: representative image creation unit, 600: display unit,
1000: Representative image extraction device for moving images.

Claims

The image is divided into a plurality of regions, and the number of high-luminance pixels equal to or greater than a first threshold and the number of edges having a difference in luminance value equal to or greater than a second threshold are counted for each region, and the number of pixels is equal to a third threshold. A region having the above caption and having the number of edges equal to or greater than the third threshold is determined as a subtitle-containing region, and the number of subtitle-containing regions is projected in the row direction and the column direction. A caption detection method characterized in that it is determined that a caption is present in an image when the maximum value of the number or the maximum value of the number of areas with captions when projected in the column direction is equal to or greater than a fourth threshold value .

2. The subtitle detection method according to claim 1, wherein the number of high-luminance pixels and the number of edges existing at the same place at least two consecutive frames in the past are counted .

Characterized in the caption detection method according to claim 1 or claim 2, the luminance difference in the horizontal direction is a second threshold or more edges, luminance difference in the vertical direction to be counted and a second threshold value or more edges Subtitle detection method.

4. The caption detection method according to claim 1 , wherein the maximum value of the number of subtitled areas when projected in the row direction is the maximum value of the number of subtitled areas when projected in the column direction. A caption detection method characterized by determining that the caption is horizontal writing when the size is larger than the above, and otherwise determining that the caption is vertical writing .

The subtitle detection method according to any one of claims 1 to 4, wherein a representative image is selected from images determined to have subtitles.

6. The subtitle detection method according to claim 5, wherein when the image determined to have subtitles is a temporally continuous frame, only one of the frames is selected as a representative image. .

7. The caption detection method according to claim 5, wherein each of the extracted representative images is reduced and displayed side by side on a screen .