JP2010152800A

JP2010152800A - Image processing apparatus, image processing method and program

Info

Publication number: JP2010152800A
Application number: JP2008332311A
Authority: JP
Inventors: Tomohiko Takahashi; 知彦高橋; Masaru Sugano; 勝菅野
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2008-12-26
Filing date: 2008-12-26
Publication date: 2010-07-08
Anticipated expiration: 2028-12-26
Also published as: JP5090330B2

Abstract

<P>PROBLEM TO BE SOLVED: To accurately recognize only a search window area in an image by analyzing the image displayed on a screen. <P>SOLUTION: The image processing apparatus 10 for detecting a character input area contained in an image includes: a polygon extraction part 11 which extracts images similar to a shape of character input area from the image displayed on the screen; a shape discrimination part 12 which extracts images having the shape of character input area from the images similar to the shape of character input area extracted by the polygon extraction part 11; and a character arrangement discrimination part 14 which extracts, from the images having the shape of character input area extracted by the shape discrimination part 12, an image having the shape of character input area in which the arrangement of characters existing in the image of the character input area is similar to the arrangement of characters to be input to a character input area in keyword search. The image having the shape of character input area extracted by the character arrangement discrimination part 14 is output. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、画面に表示された映像に含まれる文字入力領域の映像を検出する技術に関する。 The present invention relates to a technique for detecting a video of a character input area included in a video displayed on a screen.

従来から、与えられた画像の中から文字を認識する文字認識技術が知られている。例えば、特許文献１に開示されている文字認識装置は、文書をスキャナで読み込み、その画像を画像処理部が処理し、文字列化部が文字列化する。特徴抽出部は、それらから文字の特徴を抽出する。マッチング部は、その特徴を辞書と比較し、最も相似しているカテゴリを結果出力する。特徴抽出部の前に文字種別判定部が配設されていて、各入力文字の大きさによる種別分類を判定する。辞書は種別分類を備えていて、文字種別判定部は、文字列中の文字のうち最大の高さＨを基準として、文字の大きさがＨ／ｔｈ角以下のものを「小文字」と分類する。マッチング部は、種別判定処理で得た種別分類に対応する辞書のみとマッチング処理を行なう。 Conventionally, a character recognition technique for recognizing characters from a given image is known. For example, in the character recognition device disclosed in Patent Document 1, a document is read by a scanner, the image is processed by an image processing unit, and the character string converting unit converts the character string. The feature extraction unit extracts character features from them. The matching unit compares the feature with the dictionary, and outputs the category that is most similar. A character type determination unit is disposed in front of the feature extraction unit, and determines a type classification according to the size of each input character. The dictionary is provided with a classification, and the character classification determination unit classifies the characters in the character string with the maximum height H as a reference and classifies the characters with a size of H / th or less as “lowercase”. . The matching unit performs matching processing only with the dictionary corresponding to the classification obtained in the classification determination process.

一方、近年、テレビＣＭなどで、画像中にインターネット検索エンジンの検索窓を表示し、そこにあるキーワードを入力した画面を提示することで、インターネットによる検索を促す手法が一般的になりつつある。この検索窓を利用したＣＭにおいては、検索窓は一種の文字入力領域として働き、ユーザが検索窓中に文字を入力し、検索することを促すものであり、この検索窓中に提示されたキーワードは、あらかじめ広告主のサイトが検索結果の最上位に来るように調整されたものであり、このキーワードを用いて手軽に検索可能な仕組みを提供することは、非常に有意義である。 On the other hand, in recent years, on TV commercials and the like, a method of prompting a search by the Internet by displaying a search window of an Internet search engine in an image and presenting a screen on which a keyword there is input is becoming common. In the CM using this search window, the search window works as a kind of character input area, and prompts the user to input characters into the search window and perform a search. The keyword presented in the search window Has been adjusted in advance so that the advertiser's site is at the top of the search results, and it is very meaningful to provide a mechanism that allows easy search using this keyword.

従って、画面中から所望の文字が含まれた検索窓領域のみを正確に検知する技術があれば、前記技術との組み合わせによって、検索窓領域に含まれる文字の取得が可能となる。しかしながら、従来、テレビ映像中から、検索窓領域のみを識別し、その領域を正確に検出する技術は存在していなかった。一方で、テレビ映像において、検索窓領域同様、テキストを提示するための表現方法としては、テロップがある。テレビ番組中のテロップ領域を認識する技術としては、例えば以下が知られている。 Therefore, if there is a technique for accurately detecting only a search window area including a desired character from the screen, it is possible to acquire characters included in the search window area in combination with the technique. However, conventionally, there has not been a technique for identifying only the search window area from the television image and accurately detecting the area. On the other hand, there is a telop as an expression method for presenting text in a television image as in the search window area. As a technique for recognizing a telop area in a television program, for example, the following is known.

特許文献２では、テロップが「（１）画面の上部または下部の所定領域に表示される。」、「（２）テロップ出現時およびテロップ終了時、輝度変化がおきる。」という特徴を用いて、テロップを検出する。また、特許文献３では、「（３）テロップは背景画像に対して高輝度であるという特徴を用い、検出されたエッジと輝度２値化画像の組み合わせにより、テロップ検出する。」ということを行なっている。更に、特許文献４では、「（４）テロップ領域は、自然画に比べ、一定時間、安定した輝度状態を保つ。」、「（５）テロップ領域は自然画に比べ均一的な輝度レベルを示す。」という特徴を用い、テロップを検出している。
特開平５−２８３１８号公報特開平１２−２３０６２号公報特開平１２−１８２０５３号公報特開平１０−２３３９９４号公報 In Patent Document 2, telops are displayed using the characteristics “(1) displayed in a predetermined area at the top or bottom of the screen” and “(2) luminance change occurs when telops appear and ends.” Detect telop. Further, in Patent Document 3, “(3) The telop is detected with a combination of the detected edge and the luminance binarized image using the feature that the telop has higher luminance than the background image” is performed. ing. Further, in Patent Document 4, “(4) the telop area maintains a stable luminance state for a certain period of time compared to the natural image”, “(5) the telop area shows a uniform luminance level compared to the natural image. The telop is detected using the feature "."
Japanese Patent Laid-Open No. 5-28318 JP-A-12-23062 Japanese Patent Laid-Open No. 12-182053 JP-A-10-233994

しかしながら、文字入力領域としての検索窓領域は、画面の上部または下部の所定領域に表示されるわけではないので、上記（１）の特徴を利用することはできない。また、検索窓領域が表示されたとしても、常に輝度変化が起きるわけではないので、上記（２）の特徴を利用することはできない。また、検索窓領域は、背景画像に対して高輝度であるとは限らないため、上記（３）の特徴を利用することはできない。さらに、上記（４）の特徴についても、検索窓領域のあるＣＭでは、画面全体が静止画になるような例が少なからず存在するため、利用できない。上記（５）の特徴は、検索窓領域においても同様の傾向が見られるものの、この特徴のみで検索窓領域の検出を試みた場合、十分な精度を得ることが出来ず、テレビ映像に含まれる検索窓領域に似た物体を、検索窓領域と誤検知してしまう。 However, since the search window area as the character input area is not displayed in a predetermined area at the top or bottom of the screen, the feature (1) cannot be used. Even if the search window area is displayed, the luminance change does not always occur, and thus the feature (2) cannot be used. In addition, since the search window area is not always brighter than the background image, the feature (3) cannot be used. Further, the feature (4) cannot be used because there are many examples in which the entire screen becomes a still image in a CM having a search window area. The feature (5) has the same tendency in the search window area. However, if the search window area is detected only with this feature, sufficient accuracy cannot be obtained and it is included in the TV image. An object similar to the search window area is erroneously detected as the search window area.

本発明は、このような事情に鑑みてなされたものであり、テレビ映像を解析することによって、テレビ映像中の検索窓領域のみを正確に認識することができる画像処理装置、画像処理方法およびプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and an image processing apparatus, an image processing method, and a program capable of accurately recognizing only a search window area in a television image by analyzing the television image. The purpose is to provide.

なお、本発明の課題を解決する別の手段として、検索キーワードを、字幕として含むことが容易に考えられるが、２００８年１１月現在、ＣＭ放送には字幕情報が含まれないため、前述の検索キーワードは、映像としてのみ存在し、ユーザに対してテキストとして提供する手段は無い。 As another means for solving the problem of the present invention, it can be easily considered that the search keyword is included as subtitles. However, as of November 2008, since the subtitle information is not included in the CM broadcast, the above search is performed. The keyword exists only as a video, and there is no means for providing it to the user as text.

（１）上記の目的を達成するために、本発明は、以下のような手段を講じた。すなわち、本発明の画像処理装置は、画面に表示された映像に含まれる文字入力領域の映像を検出する画像処理装置であって、前記画面に表示された映像から文字入力領域の形状に近似する映像を抽出する抽出部と、前記抽出部により抽出された文字入力領域の形状に近似する映像のうち、文字入力領域の形状を有する映像を抽出する形状判別部と、前記形状判別部により抽出された文字入力領域の形状を有する映像のうち、文字入力領域の映像内に存在する文字の配列が、キーワード検索時に文字入力領域に入力されるべき文字の配列に近似する配列である文字入力領域の形状を有する映像を抽出する文字配置判別部と、を備えることを特徴としている。 (1) In order to achieve the above object, the present invention takes the following measures. That is, the image processing apparatus of the present invention is an image processing apparatus that detects a video of a character input area included in a video displayed on the screen, and approximates the shape of the character input area from the video displayed on the screen. An extraction unit that extracts an image, a shape determination unit that extracts an image having a shape of a character input region out of images that approximate the shape of the character input region extracted by the extraction unit, and the shape determination unit In the video having the shape of the character input area, the character input area of the character input area is an array that approximates the character input area to be input to the character input area when searching for keywords. And a character arrangement determining unit that extracts a video having a shape.

この構成により、高い精度で映像中の文字入力領域のみを検出することが可能となる。また、文字配置判別部は、文字入力領域内の文字情報の配置を抽出する処理を行なう。この文字情報配置判別処理は、文字の認識処理（すなわち、文字を個々の辞書と照らし合わせる処理）ではなく、画像中から、文字に一般的に見られる特徴量の存在を抽出し、その配置を調査する、軽量な処理のみを行なう。本文字情報は、位置判別処理を行なうことにより、文字入力領域でない候補を除外することで、精度良く文字入力領域を抽出することができ、無駄な文字認識処理を回避することができる。 With this configuration, it is possible to detect only the character input area in the video with high accuracy. The character arrangement determining unit performs a process of extracting the arrangement of character information in the character input area. This character information arrangement determination process is not a character recognition process (that is, a process of comparing characters with individual dictionaries), but extracts the presence of feature quantities generally found in characters from the image, and determines the arrangement. Do only lightweight processing to investigate. The character information can be extracted with high accuracy by performing position determination processing to exclude candidates that are not character input regions, and unnecessary character recognition processing can be avoided.

（２）また、本発明の画像処理装置において、前記抽出部は、多角形近似を行なうことにより、前記画面に表示された映像から文字入力領域の形状に近似する映像を抽出することを特徴としている。 (2) Further, in the image processing apparatus of the present invention, the extraction unit extracts a video approximating the shape of a character input area from the video displayed on the screen by performing polygon approximation. Yes.

このように、色ヒストグラム処理や文字入力領域内の文字配置の判別といった処理の前に、文字入力領域として期待される形状に基づいた抽出処理を行ない、文字入力領域の候補を大幅に削減することで、処理負担の軽減と動作の迅速化を図ることが可能となる。 In this way, extraction processing based on the shape expected as the character input area is performed before processing such as color histogram processing and character arrangement determination in the character input area, and the number of character input area candidates is greatly reduced. Thus, it is possible to reduce the processing load and speed up the operation.

（３）また、本発明の画像処理装置において、前記抽出部は、前記多角形近似の結果のうち、４頂点以上の多角形の中から、最長の２直線が画面に対して水平である多角形を文字入力領域の候補として抽出することを特徴としている。 (3) Further, in the image processing apparatus of the present invention, the extraction unit may include a plurality of polygons having four or more vertices among the polygon approximation results, and the longest two straight lines are horizontal with respect to the screen. It is characterized in that a square is extracted as a candidate for a character input area.

このように、多角形近似の結果のうち、４頂点以上の多角形の中から、最長の２直線が画面に対して水平である多角形を文字入力領域の候補として抽出するので、処理負担の軽減と動作の迅速化を図ることが可能となる。 As described above, among the polygon approximation results, the polygon having the longest two straight lines horizontal to the screen is extracted from the polygons having four or more vertices as the character input area candidates. It is possible to reduce and speed up the operation.

（４）また、本発明の画像処理装置において、前記抽出部は、前記多角形近似の結果のうち、４頂点以上の多角形の中から、最長の２直線が画面に対して水平であり、かつ、その２直線の長さが等しい多角形を、文字入力領域の候補として抽出することを特徴としている。 (4) Moreover, in the image processing apparatus of the present invention, the extraction unit has the longest two straight lines out of the polygons having four or more vertices among the polygon approximation results, and is horizontal to the screen. In addition, it is characterized in that polygons having the same length of the two straight lines are extracted as candidates for the character input area.

このように、多角形近似の結果のうち、４頂点以上の多角形の中から、最長の２直線が画面に対して水平であり、かつ、その２直線の長さが等しい多角形を、文字入力領域の候補として抽出するので、処理負担の軽減と動作の迅速化を図ることが可能となる。 In this way, among the polygon approximation results, the longest two straight lines out of the polygons having four or more vertices are horizontal with respect to the screen and the lengths of the two straight lines are equal to each other. Since the input area is extracted as a candidate, it is possible to reduce the processing load and speed up the operation.

（５）また、本発明の画像処理装置において、前記抽出部は、前記多角形近似の結果のうち、画面に対して水平な２直線間の距離から、前記多角形の高さを求め、その高さが、前記多角形の横の長さの所定の割合以下の長さである領域を、文字入力領域の候補として抽出することを特徴している。 (5) Moreover, in the image processing apparatus of the present invention, the extraction unit obtains the height of the polygon from the distance between two straight lines horizontal to the screen among the results of the polygon approximation, A region whose height is a length equal to or less than a predetermined ratio of the horizontal length of the polygon is extracted as a character input region candidate.

このように、多角形近似の結果のうち、画面に対して水平な２直線間の距離から、多角形の高さを求め、その高さが、多角形の横の長さの所定の割合以下の長さである領域を、文字入力領域の候補として抽出するので、処理負担の軽減と動作の迅速化を図ることが可能となる。ここで、「多角形の高さが、多角形の横の長さの所定の割合以下である」とは、例えば、多角形の高さが、多角形の横の長さの１／３以下である、というように定めることができる。すなわち、一般的によく知られている検索用の文字入力領域に近似する形状のみを抽出することができる。 As described above, the polygon height is obtained from the distance between two straight lines parallel to the screen among the polygon approximation results, and the height is equal to or less than a predetermined ratio of the horizontal length of the polygon. Is extracted as a candidate for a character input area, it is possible to reduce the processing load and speed up the operation. Here, “the height of the polygon is equal to or less than a predetermined ratio of the horizontal length of the polygon” means, for example, that the height of the polygon is 1/3 or less of the horizontal length of the polygon. It can be determined as follows. That is, it is possible to extract only a shape that approximates a generally well-known character input area for search.

（６）また、本発明の画像処理装置において、前記文字配置判別部は、前記多角形近似によって得られた領域に対して、色ヒストグラムを算出し、その頻度から、背景色、文字色を判別し、文字色の出現頻度から、文字入力領域であるか否かを判別することを特徴としている。 (6) In the image processing apparatus according to the present invention, the character arrangement determining unit calculates a color histogram for the region obtained by the polygon approximation, and determines a background color and a character color from the frequency. Then, it is characterized by determining whether or not it is a character input area from the appearance frequency of the character color.

このように、多角形近似によって得られた領域に対して、色ヒストグラムを算出し、その頻度から、背景色、文字色を判別し、文字色の出現頻度から、文字入力領域であるか否かを判別するので、高い精度で文字入力領域を抽出することが可能となる。 In this way, a color histogram is calculated for the area obtained by polygon approximation, the background color and the character color are determined based on the frequency, and whether the character input area is determined based on the appearance frequency of the character color. Therefore, the character input area can be extracted with high accuracy.

（７）また、本発明の画像処理装置において、前記文字配置判別部は、前記多角形近似によって得られた領域に対して、コーナー検出を行ない、検出されたコーナーの頻度から領域内の文字の配置を判別し、文字の配置から、文字入力領域であるか否かを判別することを特徴としている。 (7) Further, in the image processing apparatus of the present invention, the character arrangement determination unit performs corner detection on the region obtained by the polygon approximation, and determines the character in the region from the frequency of the detected corner. It is characterized by determining the arrangement and determining whether or not it is a character input area from the arrangement of characters.

このように、多角形近似によって得られた領域に対して、コーナー検出を行ない、検出されたコーナーの頻度から領域内の文字の配置を判別し、文字の配置から、文字入力領域であるか否かを判別するので、高い精度で文字入力領域を抽出することが可能となる。 In this way, corner detection is performed on the area obtained by polygon approximation, the arrangement of characters in the area is determined from the frequency of the detected corner, and whether or not the character input area is determined from the character arrangement. Therefore, it is possible to extract the character input area with high accuracy.

（８）また、本発明の画像処理装置は、前記形状判別部により抽出された文字入力領域の形状を有する映像のうち、複数の文字入力領域が抽出された際に、それらの間の横の長さの比および画面内での位置関係から、前記文字入力領域以外の領域を取り除く位置判別部をさらに備えることを特徴としている。 (8) Further, the image processing apparatus according to the present invention, when a plurality of character input areas are extracted from the image having the shape of the character input area extracted by the shape determination unit, A position determining unit that removes an area other than the character input area from the length ratio and the positional relationship in the screen is further provided.

このように、形状判別部により抽出された文字入力領域の形状を有する映像のうち、複数の文字入力領域が抽出された際に、それらの間の横の長さの比および画面内での位置関係から、文字入力領域以外の領域を取り除くので、高い精度で文字入力領域を抽出することが可能となる。 As described above, when a plurality of character input areas are extracted from the image having the shape of the character input area extracted by the shape discriminating unit, the ratio of the horizontal length between them and the position in the screen Since the area other than the character input area is removed from the relationship, the character input area can be extracted with high accuracy.

（９）また、本発明の画像処理装置は、前記形状判別部により抽出された文字入力領域の形状を有する映像のうち、文字入力領域の映像内の文字数が時間と共に変化する場合、文字数の変化が終了したかどうかを判別する文字入力完了判別部をさらに備え、前記文字配置判別部は、文字数の変化が終了したときにキーワード検索時に文字入力領域に入力されるべき文字の配列に近似する配列である文字入力領域の形状を有する映像を抽出することを特徴としている。 (9) In the image processing apparatus of the present invention, the number of characters changes when the number of characters in the image of the character input area changes with time among the images having the shape of the character input area extracted by the shape determining unit. A character input completion determining unit that determines whether or not the character input is completed, and the character arrangement determining unit is an array that approximates an array of characters to be input to the character input area when searching for a keyword when the number of characters has been changed An image having the shape of the character input area is extracted.

この構成により、文字入力領域に文字が徐々に入力されている映像であっても、文字入力が完了したタイミングで文字入力領域の形状を有する映像を抽出することが可能となる。 With this configuration, it is possible to extract an image having the shape of the character input area at the timing when the character input is completed, even if the image is a character input to the character input area gradually.

（１０）また、本発明の画像処理装置において、前記文字配置判別部により抽出された文字入力領域の形状を有する映像を、文字認識を行なう文字認識部に出力し、文字認識部によって認識された文字を用いて、キーワード検索を行なうことを特徴としている。 (10) In the image processing apparatus of the present invention, the video having the shape of the character input area extracted by the character arrangement determination unit is output to the character recognition unit that performs character recognition, and is recognized by the character recognition unit. Characteristic search is performed using characters.

この構成により、例えば、ユーザがテレビＣＭを視聴中に、文字入力領域のところでリモコンボタンを押すことによって、キーワード検索をすることが可能となる。 With this configuration, for example, a keyword search can be performed by pressing a remote control button in a character input area while a user is watching a TV commercial.

（１１）また、本発明の画像処理方法は、画面に表示された映像に含まれる文字入力領域の映像を検出する画像処理方法であって、抽出部において、多角形近似を行なうことにより、前記画面に表示された映像から文字入力領域の形状に近似する映像を抽出するステップと、形状判別部において、前記抽出部により抽出された文字入力領域の形状に近似する映像のうち、文字入力領域の形状を有する映像を抽出するステップと、文字入力完了判別部において、前記形状判別部により抽出された文字入力領域の形状を有する映像のうち、文字入力領域の映像内の文字数が時間と共に変化する場合、文字数の変化が終了したかどうかを判別するステップと、文字配置判別部において、文字数の変化が終了したときに、前記形状判別部により抽出された文字入力領域の形状を有する映像のうち、文字入力領域の映像内に存在する文字の配列が、キーワード検索時に文字入力領域に入力されるべき文字の配列に近似する配列である文字入力領域の形状を有する映像を抽出するステップと、前記文字配置判別部により抽出された文字入力領域の形状を有する映像を出力するステップと、を少なくとも含むことを特徴としている。 (11) Further, the image processing method of the present invention is an image processing method for detecting a video of a character input area included in a video displayed on a screen, wherein the extraction unit performs polygon approximation, Extracting a video approximating the shape of the character input area from the video displayed on the screen; and, in the shape discriminating unit, out of the video approximating the shape of the character input area extracted by the extracting unit. In the step of extracting a video having a shape, and the character input completion determining unit, when the number of characters in the video of the character input region among the images having the shape of the character input region extracted by the shape determining unit changes with time In the step of determining whether or not the change in the number of characters is completed, and in the character arrangement determining unit, when the change in the number of characters is completed, the shape determining unit extracts In the video having the shape of the character input area, the character input area of the character input area is an array that approximates the character input area to be input to the character input area when searching for keywords. The method includes at least a step of extracting a video having a shape and a step of outputting a video having a shape of the character input area extracted by the character arrangement determination unit.

（１２）また、本発明のプログラムは、画面に表示された映像に含まれる文字入力領域の映像を検出するプログラムであって、多角形近似を行なうことにより、前記画面に表示された映像から文字入力領域の形状に近似する映像を抽出する処理と、前記抽出された文字入力領域の形状に近似する映像のうち、文字入力領域の形状を有する映像を抽出する処理と、前記抽出された文字入力領域の形状を有する映像のうち、文字入力領域の映像内の文字数が時間と共に変化する場合、文字数の変化が終了したかどうかを判別する処理と、前記文字数の変化が終了したときに、前記抽出された文字入力領域の形状を有する映像のうち、文字入力領域の映像内に存在する文字の配列が、キーワード検索時に文字入力領域に入力されるべき文字の配列に近似する配列である文字入力領域の形状を有する映像を抽出する処理と、前記抽出された文字入力領域の形状を有する映像を出力する処理と、の一連の処理が、コンピュータに読み取りおよび実行可能にコマンド化されたことを特徴としている。 (12) The program of the present invention is a program for detecting a video of a character input area included in a video displayed on the screen, and by performing polygon approximation, characters from the video displayed on the screen are detected. A process of extracting a video approximating the shape of the input area; a process of extracting a video having a character input area shape out of the video approximating the shape of the extracted character input area; and the extracted character input When the number of characters in the image of the character input region changes with time among the images having the shape of the region, the process of determining whether the change in the number of characters has ended, and the extraction when the change in the number of characters ends Among the images having the character input area shape, the character array existing in the character input area image is changed to the character array to be input to the character input area at the time of keyword search. A series of processes of extracting a video having a shape of a character input area that is a similar arrangement and outputting a video having a shape of the extracted character input area can be read and executed by a computer. It is characterized by being commanded.

本発明によれば、高い精度で映像中の文字入力領域のみを検出することが可能となる。その結果、キーボードを用いずに簡易にキーワード検索を行なうことが可能となる。 According to the present invention, it is possible to detect only a character input area in a video with high accuracy. As a result, keyword search can be easily performed without using a keyboard.

図１は、本発明の実施形態に係る画像処理装置の概略構成を示すブロック図である。また、図２は、本発明の実施形態に係る画像処理装置の動作を示すメインフローチャートであり、図３は、文字入力領域（以下、「検索窓領域」と呼称する。）の抽出動作を示すフローチャートであり、また、図４は、文字入力完了判別動作を示すフローチャートである。図１において、画像処理装置１０は、多角形抽出部１１、形状判別部１２、色ヒストグラム算出部１３、文字配置判別部１４、位置判別部１６、文字入力完了判別部１７を備えている。また、画像処理装置１０は、映像信号入力部１９から映像信号を入力し、処理結果を文字認識部１８に出力する。 FIG. 1 is a block diagram showing a schematic configuration of an image processing apparatus according to an embodiment of the present invention. FIG. 2 is a main flowchart showing the operation of the image processing apparatus according to the embodiment of the present invention. FIG. 3 shows the operation of extracting a character input area (hereinafter referred to as “search window area”). FIG. 4 is a flowchart showing a character input completion determining operation. In FIG. 1, the image processing apparatus 10 includes a polygon extraction unit 11, a shape determination unit 12, a color histogram calculation unit 13, a character arrangement determination unit 14, a position determination unit 16, and a character input completion determination unit 17. Further, the image processing apparatus 10 inputs a video signal from the video signal input unit 19 and outputs a processing result to the character recognition unit 18.

本発明の実施形態では、インターネット接続機能を備えたテレビ受信機が、本発明の画像処理装置１０を内蔵し、ユーザはテレビのリモコン操作によって本発明の機能を用いることが出来るものとする。 In the embodiment of the present invention, it is assumed that a television receiver having an Internet connection function incorporates the image processing apparatus 10 of the present invention, and a user can use the functions of the present invention by operating a remote control of the television.

図２において、まず、画像処理装置１０は、ユーザが視聴しているテレビ映像から、一定間隔でフレームの情報を取得する。検索窓領域は、その領域に表示される検索キーワードを、視聴者に覚えてもらう必要があるため、数秒間表示される。一般にテレビ映像は１秒あたり２９．９７フレームから構成されるため、画像処理装置１０は、数十フレームに一度の周期で、１フレームを抽出する（ステップＳ１）ことで、検索窓領域が含まれたフレームを、もれなく抽出することが可能である。次に、検索窓領域の抽出を行なう（ステップＳ２）。この検索窓領域の抽出動作は、図３に示すフローチャートに従って行なわれる。 In FIG. 2, first, the image processing apparatus 10 acquires frame information at regular intervals from a television image that the user is viewing. The search window area is displayed for several seconds because it is necessary for the viewer to remember the search keyword displayed in the area. In general, since a television image is composed of 29.97 frames per second, the image processing apparatus 10 extracts one frame at a cycle of once every several tens of frames (step S1), thereby including a search window area. It is possible to extract all the frames without missing. Next, the search window area is extracted (step S2). This search window region extraction operation is performed according to the flowchart shown in FIG.

図３において、まず、エッジ抽出・多角形近似を行なう（ステップＳ１０）。検索窓領域は、キーワードが枠線で囲まれる形で構成されるため、その輪郭は、閉じた多角形で近似可能である。そこで、検索窓領域を４頂点以上の多角形で近似する。映像信号入力より得られた画像フレームに対して、多角形抽出部１１では、まず、入力画像の輪郭を抽出する。輪郭を抽出する手法としては、任意の手法が利用可能であるが、例えばCannyフィルタ（J.Canny.A Computational Approach to Edge Detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, 8(6), pp.679-698 (1986).）を用いてエッジを抽出することが考えられる。 In FIG. 3, first, edge extraction / polygon approximation is performed (step S10). Since the search window area is configured in such a way that the keywords are surrounded by a frame line, the contour can be approximated by a closed polygon. Therefore, the search window area is approximated by a polygon having four or more vertices. For the image frame obtained from the video signal input, the polygon extraction unit 11 first extracts the contour of the input image. Arbitrary methods can be used as a method for extracting the contour. For example, the Canny filter (J. Canny. A Computational Approach to Edge Detection, IEEE Trans. On Pattern Analysis and Machine Intelligence, 8 (6), pp. 679-698 (1986).) To extract edges.

図５Ａは、現画像の例を示す図であり、図５Ｂは、そこから抽出されたエッジ画像の例を示す図である。検索窓領域のエッジは、閉じた枠線を形成する。この枠線を検出するため、抽出されたエッジ画像に対して直線検出を行なう。直線検出の手法は、任意の手法が適用可能であるが、例えば、ラドン変換を用いて直線を検出する方法が、特開２００５−２７５４４７号公報に開示されている。次に、検出した直線に対して、それぞれが閉じた枠線を形成するかどうかを判別する。判別の方法は任意の手法が利用可能であるが、例えば、以下のような手法が利用可能である。
（１）検出された直線から一つを選び、直線と隣接する領域に直線があるかを調べる。
（２）（１）の処理で見つけた直線に移動し、（１）と同様に隣接する領域を調べる。
（３）上記の処理を繰り返し、最初に選んだ直線に戻れば、それは閉じた枠線であると判別する。
（４）上記１〜３の処理を、検出された全てのエッジについて調べ終わるまで繰り返す。 FIG. 5A is a diagram illustrating an example of a current image, and FIG. 5B is a diagram illustrating an example of an edge image extracted therefrom. The edge of the search window area forms a closed frame line. In order to detect this frame line, straight line detection is performed on the extracted edge image. Although any method can be applied as the method of detecting a straight line, for example, a method for detecting a straight line using Radon transform is disclosed in Japanese Patent Laid-Open No. 2005-275447. Next, it is determined whether or not each of the detected straight lines forms a closed frame line. An arbitrary method can be used as the determination method. For example, the following method can be used.
(1) One of the detected straight lines is selected, and it is checked whether there is a straight line in an area adjacent to the straight line.
(2) Move to the straight line found in the process of (1), and examine the adjacent area as in (1).
(3) Repeating the above process and returning to the initially selected straight line, it is determined that it is a closed frame line.
(4) The processes 1 to 3 are repeated until all the detected edges are examined.

検出された直線を滑らかにするため、多角形での近似を行なう。近似のアルゴリズムは、任意の手法が利用可能であるが、例えば、Douglas-Peuckerアルゴリズム等が利用可能である。図５Ｃは、多角形近似された結果の例を示す図である。 In order to smooth the detected straight line, approximation with a polygon is performed. An arbitrary algorithm can be used as the approximation algorithm. For example, a Douglas-Peucker algorithm or the like can be used. FIG. 5C is a diagram illustrating an example of a result of polygon approximation.

次に、形状判別を行なう（ステップＳ１１）。多角形が検索窓（領域）の形状であるかどうかを判断する（ステップＳ１２）。形状判別部１２では、ステップＳ１０で得られた多角形の形状から、検索窓領域でないものを取り除く。検索窓領域の形状は図６Ａのような矩形が一般的であるが、図６Ｂ、図６Ｃのような形も見られる。しかしながら、検索窓（領域）は、中に文字を入力するという性質から、画面に対して平行に配置され、また、インターネットの検索をイメージさせる必要性から、横長で、上下の線分の長さがほぼ等しい多角形である。すなわち、台形やひし形の中に記載された文字では、インターネットの検索を十分に想起させることが出来ない。そのため、少なくとも４頂点を持つ四角形であり、場合によってはそれ以上の多角形である。そこで、多角形近似の結果から、形状判別部１２において以下の判別処理を行なう。
（１）４頂点より少ない頂点を持つ領域を除外する。なぜならば、検索窓領域は少なくとも平行な二直線を持つため、４角形以上の多角形、多くは４角形になるためである。
（２）多角系に含まれる最長の２直線を選択し、それが画面に対して十分に水平であることをチェックする。チェックの方法としては、最長の２直線の始点と終点が、それぞれ、
［直線１について］
始点（ｘ１，ｙ１）、終点（ｘ２，ｙ２）
［直線２について］
始点（ｘ３、ｙ３）、終点（ｘ４、ｙ４）
とするとき、
｜ｙ１−ｙ２｜＜Ｐ１かつ
｜ｙ３−ｙ４｜＜Ｐ１
であること。ただし、ここでＰ１は閾値とする。 Next, shape discrimination is performed (step S11). It is determined whether the polygon is the shape of the search window (region) (step S12). The shape discriminating unit 12 removes those that are not search window regions from the polygonal shape obtained in step S10. The shape of the search window area is generally a rectangle as shown in FIG. 6A, but shapes as shown in FIGS. 6B and 6C can also be seen. However, the search window (area) is placed parallel to the screen due to the nature of entering characters in it. Also, the search window (area) is horizontally long and the length of the upper and lower line segments because of the necessity of making an Internet search image. Are almost equal polygons. That is, the characters described in trapezoids and rhombuses do not evoke the Internet search sufficiently. Therefore, it is a quadrangle having at least four vertices, and in some cases, it is a polygon more than that. Therefore, the shape discrimination unit 12 performs the following discrimination process based on the polygon approximation result.
(1) Exclude areas with fewer than 4 vertices. This is because the search window region has at least two parallel straight lines, so that the search window region is a polygon that is equal to or more than a quadrangle, and is often a quadrangle.
(2) Select the longest two straight lines included in the polygon system and check that they are sufficiently horizontal to the screen. As a check method, the start and end points of the longest two straight lines are
[About line 1]
Start point (x1, y1), end point (x2, y2)
[About line 2]
Start point (x3, y3), end point (x4, y4)
And when
| Y1-y2 | <P1 and | y3-y4 | <P1
Be. Here, P1 is a threshold value.

また、さらに最長の２直線が十分に平行であることをチェックすることも、精度向上のために有効である。２直線が平行であることのチェックの方法としては、前述の水平条件に加え、
｜｜ｙ１−ｙ２｜−｜ｙ３−ｙ４｜｜＜Ｐ２
であることを調べる。ただし、ここでＰ２はＰ１より小さい閾値とする。
（３）続いて、最長の２直線の長さが大きく離れていないことをチェックする。チェック方法としては、上記の条件において、
｜｜ｘ２−ｘ１｜−｜ｘ３−ｘ４｜｜＜Ｐ３であること。Ｐ３は閾値とする。
（４）２直線間の距離から、検索窓領域の高さを取得し、検索窓領域の横の長さに対して十分小さいことをチェックする。上記（１）により、すでに２直線が画面に対して水平かつ平行に近いことは確認されているため、検索窓領域の高さは、ｙ座標の平均値を比較することで得られる。すなわち、次の数式で得ることができる。

ここで、Ｐ４は係数である。 It is also effective for improving accuracy to check that the two longest straight lines are sufficiently parallel. As a method of checking that two straight lines are parallel, in addition to the above horizontal condition,
|| y1-y2 |-| y3-y4 || <P2
Find out. Here, P2 is a threshold value smaller than P1.
(3) Subsequently, it is checked that the longest two straight lines are not far apart. As a check method, in the above conditions,
|| x2-x1 |-| x3-x4 || <P3. P3 is a threshold value.
(4) Obtain the height of the search window area from the distance between the two straight lines, and check that it is sufficiently small with respect to the horizontal length of the search window area. Since (2) has already confirmed that the two straight lines are almost horizontal and parallel to the screen, the height of the search window region can be obtained by comparing the average values of the y coordinates. That is, it can be obtained by the following formula.

Here, P4 is a coefficient.

この形状判別処理は、検索窓領域の枠として得られた直線の座標情報に対してのみ計算をすればよく、後述する処理に比べ軽量である。従って、この処理で候補を絞り込むことで、全体の計算量を減らすことができる。図９は、上記の形状判別処理の考え方を概念的に示す図である。 This shape determination process needs to be calculated only for the coordinate information of the straight line obtained as the frame of the search window area, and is lighter than the process described later. Therefore, by narrowing down the candidates by this process, the overall calculation amount can be reduced. FIG. 9 is a diagram conceptually showing the concept of the shape discrimination process.

次に、文字配置判別を行なう（ステップＳ１４）。検索窓領域内には、検索キーワードが記述される。検索窓領域はインターネットの検索エンジンを模倣しているため、この検索キーワードの水平方向の位置は中央、または左詰、垂直方向の位置は、中央または下よりである。文字配置判別部１４では、この文字位置を判別することによって、野球中継におけるスコアボード、番組のテロップ、画面に表示された表のカラムなどを取り除く。文字位置の判別は以下の２つの処理のいずれか、またはその両方を行なう。 Next, character arrangement determination is performed (step S14). Search keywords are described in the search window area. Since the search window area imitates an Internet search engine, the horizontal position of the search keyword is centered or left-justified, and the vertical position is from the center or bottom. The character arrangement discriminating unit 14 discriminates the character position, thereby removing a scoreboard, a program telop, a table column displayed on the screen, and the like. The character position is determined by either one of the following two processes or both.

（１）色ヒストグラムの分布を見る手法
色ヒストグラム算出部１３によって、検出された多角形内の色ヒストグラムを算出する。ヒストグラムの頻度が一番大きい部分が背景色であり、２番目に大きい部分が文字であると判別できる。この２番目に頻度の高い色の値を持つ部分の配置から、中に文字の入った検索窓領域か否かを判別する。
水平方向判別としては、水平方向の座標を、検出された検索窓領域候補内の左上の頂点座標を（ｘ１、ｙ１）、右下の頂点座標を（ｘｎ、ｙｎ）としたとき、垂直方向座標がｙ１である全ての画素について、２番目に高い頻度を持つ画素の出現頻度を算出する。画素ｙ＋１、ｙ＋２についても同様の処理を行ない、出現最終的にｙｎまで頻度を算出する。中に文字の入った検索窓領域の場合、画素の出現頻度が閾値を超えるのは、ｙ座標では座標がｙｎに近い位置（つまり、下詰め）、ｘ座標では、座標がｘ１に近い方向または中央（つまり、左詰めまたは中央詰め）になる。
（ｙ１付近の出現頻度アベレージ）≧（ｙｎ付近の出現頻度アベレージ）
（ｘ１付近の出現頻度アベレージ）≦（ｘｎ付近の出現頻度アベレージ）
である場合、検索窓領域で無いと判別する（ステップＳ１３）。
（２）コーナー検出によって文字のコーナーの分布を見る方法
文字はいずれの言語であっても、直線・曲線・点から構成される。そのため、文字領域は大小さまざまなコーナーを含む。従って、このコーナーを検出することで、文字の配置を検出することが可能となる。コーナー検出の手法は、Ｈａｒｒｉｓの手法等既存の手法を利用可能であるが、検出したコーナーから、文字の配置の検索窓領域らしさを判別する部分は、本発明の特徴の一つである。 (1) Method of Viewing Color Histogram Distribution The color histogram calculation unit 13 calculates a color histogram in the detected polygon. It can be determined that the portion with the highest frequency in the histogram is the background color, and the second largest portion is the character. It is determined from the arrangement of the portion having the second most frequent color value whether or not it is a search window region in which characters are contained.
As the horizontal direction discrimination, when the horizontal coordinate is defined as (x1, y1) as the upper left vertex coordinate and (xn, yn) as the lower right vertex coordinate in the detected search window area candidate, The appearance frequency of the pixel having the second highest frequency is calculated for all the pixels having y1. Similar processing is performed for the pixels y + 1 and y + 2, and the frequency is calculated until the final appearance yn. In the case of a search window region in which characters are entered, the pixel appearance frequency exceeds the threshold value in the y coordinate position where the coordinate is close to yn (that is, bottom), and in the x coordinate direction where the coordinate is close to x1 or Centered (ie left justified or center justified).
(Appearance frequency average near y1) ≧ (Appearance frequency average near yn)
(Appearance frequency average near x1) ≦ (Appearance frequency average near xn)
If it is, it is determined that it is not the search window area (step S13).
(2) Method of viewing the distribution of character corners by corner detection Characters are composed of straight lines, curves, and points in any language. Therefore, the character area includes various corners. Therefore, it is possible to detect the arrangement of characters by detecting this corner. An existing method such as the Harris method can be used as the corner detection method, but the part of determining the character of the search window area of the character arrangement from the detected corner is one of the features of the present invention.

図７Ａは、これまでの処理によって検出された検索窓領域の例を示す図であり、図７Ｂは、その領域に対してコーナー検出を行なった結果を示す。丸の部分が、検出されたコーナーである。 FIG. 7A is a diagram showing an example of a search window area detected by the processing so far, and FIG. 7B shows a result of corner detection performed on the area. A round part is a detected corner.

ここで、水平方向、垂直方向それぞれに対して、コーナーの出現頻度から文字の配置判別を行なう。今、検索窓領域内を、垂直方向、水平方向について、それぞれ適当な数に分割し（図８参照）、分割された領域内のエッジの個数を算出し、それぞれのエリアに対して、検出されたエッジの個数＞Ｐとなるエリアを文字が含まれたエリアと判別する。 Here, the character arrangement is determined from the appearance frequency of the corners in each of the horizontal direction and the vertical direction. Now, the search window area is divided into appropriate numbers in the vertical and horizontal directions (see FIG. 8), and the number of edges in the divided area is calculated and detected for each area. An area where the number of edges> P is determined as an area including characters.

ここで、分割された領域のうち、文字が出現しない領域のみで構成される縦列を右から数えたものを、「右方向の空白数Ｅｍｐｔｙｒ」と定義する。図８では、右方向の空白数は２である。左方向（Ｅｍｐｔｙｌ）、上（Ｅｍｐｔｙｃ）下（Ｅｍｐｔｙｆ）についても同様に空白数を算出する。 Here, among the divided areas, a column composed of only areas where no characters appear is counted from the right, and is defined as “right blank number Empty r”. In FIG. 8, the number of white spaces in the right direction is 2. The number of blanks is similarly calculated for the left direction (Empty l), upper (Empty c), and lower (Empty f).

ここで、文字が横方向では左詰めまたは中央詰め、縦方向では中央詰めまたは下詰めになることから、
Ｅｍｐｔｙｌ＝＞Ｔ１＊Ｅｍｐｔｙｒ
Ｅｍｐｔｙｆ＝＞Ｔ２＊Ｅｍｐｔｙｃ
の場合、検索窓領域で無いと判断する。ここで、Ｔ１、Ｔ２は定数である。 Here, characters are left justified or center justified in the horizontal direction, and center justified or bottom justified in the vertical direction,
Empty l => T1 * Empty r
Empty f => T2 * Empty c
In this case, it is determined that it is not the search window area. Here, T1 and T2 are constants.

以上のようにして、多角形内のヒストグラム配置が検索窓領域として適切である場合は、検索窓候補であるとする（ステップＳ１５）。 As described above, if the histogram arrangement in the polygon is appropriate as the search window region, it is determined as a search window candidate (step S15).

次に、位置判別処理を行なう（ステップＳ１６）。検索窓領域を用いたテレビＣＭにおいては、検索窓領域のそばに、検索ボタンを模したオブジェクトがおかれる例が多い。このオブジェクトは、検索窓領域内部と非常に近い傾向を持つため、誤検知が起こりやすいが、検索ボタンには以下の傾向がある。
「検索窓領域よりも横の長さが小さく、検索窓領域の右に置かれる。」 Next, position discrimination processing is performed (step S16). In television commercials using a search window area, there are many examples in which an object that imitates a search button is placed near the search window area. Since this object has a tendency to be very close to the inside of the search window area, erroneous detection is likely to occur, but the search button has the following tendency.
“The horizontal length is smaller than the search window area and it is placed to the right of the search window area.”

位置判別部１６では、抽出された検索窓領域候補に対して、そのうちひとつが上記を満たす場合、それは検索ボタンと判断し、取り除く処理を行なう。すなわち、一つの画像中から、検索窓の候補となる領域が複数見つかった場合、候補ｎを形成する領域の右上点座標を（ｘｎ、ｙｎ）、領域の横の長さをｗｎ、候補ｍを形成する領域の右上座標を（ｘｍ、ｙｍ）、領域の横の長さをｗｍとするとき、
｜ｙｎ−ｙｍ｜＜Ｐ５かつ
ｘｎ＜＝ｘｍ − ｗｍかつ
ｗｎ＞ｗｍ × Ｐ６
を満たすとき、候補ｍは検索ボタンと判断し、検索窓領域候補から取り除く。ステップＳ１６において、検索ボタンである場合は、検索窓候補でないとされる（ステップＳ１７）。検索ボタンで無い場合は、次の処理へ進む。 If one of the extracted search window area candidates satisfies the above, the position determination unit 16 determines that it is a search button and performs a removal process. That is, when a plurality of search window candidate areas are found in one image, the upper right coordinate of the area forming the candidate n is (xn, yn), the horizontal length of the area is wn, and the candidate m is When the upper right coordinate of the area to be formed is (xm, ym) and the horizontal length of the area is wm,
| Yn−ym | <P5 and xn <= xm−wm and wn> wm × P6
When the condition is satisfied, the candidate m is determined as a search button, and is removed from the search window area candidates. If it is a search button in step S16, it is not a search window candidate (step S17). If it is not a search button, the process proceeds to the next process.

次に、表示時間判別を行なう（ステップＳ１８）。検索窓領域広告においては、検索窓領域は、少なくともユーザが目で確認できる時間だけは表示されるという特徴があるため、一定時間ｎだけ表示されているかの判別を行なう。ここで、検索窓領域の特徴として、窓の中に文字が動的に入力されていくことで、検索窓領域内部には変化がおきるが、検索窓領域を形成する枠線そのものは変化しないという特徴を用い、この枠線が固定されているかで判別を行なう。 Next, display time determination is performed (step S18). In the search window area advertisement, since the search window area is displayed at least for a time that can be visually confirmed by the user, it is determined whether the search window area is displayed for a predetermined time n. Here, as a feature of the search window area, when characters are dynamically entered in the window, the inside of the search window area changes, but the frame line itself forming the search window area does not change. A feature is used to determine whether this frame line is fixed.

判別方法としては、上記のエッジ抽出処理と、形状判別処理を行なう。ここで、前述の通り検索窓領域には矩形ではなく図６Ｂのような形状を持ったものも含まれるため、簡略化のため、この表示時間判別には、水平方向の２線分を用い、現在のフレームにおける検索窓領域候補の最長の線分が、ｎフレーム後も同じ位置にある場合、検索窓領域候補はｎフレーム後も存在していると判別する。ここまでの処理で、検索窓領域候補を１つに絞り込む（ステップＳ１９）。 As the determination method, the above-described edge extraction processing and shape determination processing are performed. Here, as described above, the search window area includes not only a rectangular shape but also a shape as shown in FIG. 6B. Therefore, for simplification, for the display time determination, two horizontal segments are used. If the longest line segment of the search window area candidate in the current frame is in the same position after n frames, it is determined that the search window area candidate exists after n frames. With the processing so far, the search window area candidates are narrowed down to one (step S19).

図２において、検索窓領域の抽出が終了すると（ステップＳ２）、ユーザへの検索窓領域の提示を行なう（ステップＳ３）。次に、文字入力完了判別を行なう（ステップＳ４）。この文字入力完了判別処理は、図４に示すフローチャートに従って行なわれる。ここで、検索窓領域中の文字表示には、以下の２種類の表示方法がある。
（１）全ての文字が最初から入力されている。
（２）文字が徐々に入力されていく。 In FIG. 2, when the extraction of the search window area is completed (step S2), the search window area is presented to the user (step S3). Next, character input completion determination is performed (step S4). This character input completion determination process is performed according to the flowchart shown in FIG. Here, there are the following two display methods for displaying characters in the search window area.
(1) All characters are input from the beginning.
(2) Characters are input gradually.

そのため、（２）のケースでは、認識した検索窓領域内の文字列を直に取得するのではなく、文字入力が完了したタイミングで取得することが望ましい。検索窓領域の入力完了を判断する手法としては、検索窓領域の最後の状態は文字入力が完了しているため、シーンチェンジがおこるまで待って検索窓領域を取得するという手法が容易に考えられる。しかし、その手法では、（１）の場合に窓領域を認識するまで時間がかかるという問題がある。 For this reason, in the case (2), it is desirable not to directly acquire the character string in the recognized search window area but to acquire it at the timing when the character input is completed. As a method for determining the completion of input of the search window area, since the character input has been completed in the last state of the search window area, a technique of waiting for a scene change and acquiring the search window area can be easily considered. . However, this method has a problem that it takes time until the window area is recognized in the case of (1).

そこで、図４に示すフローチャートに基づき、（１）と（２）双方に対応する文字入力完了判別を行なう。文字入力完了判別部１７は、検索窓領域候補を取得した後、対象のフレームより、時間的にｍフレーム後のフレームの後続フレームを同様に取得する（ステップＳ２０）。前述の通り、検索窓領域は、少なくともユーザが目で確認できる時間だけは表示されるという特徴があるため、ここで取得間隔ｍはその時間よりも小さい間隔で行なう。 Therefore, the character input completion determination corresponding to both (1) and (2) is performed based on the flowchart shown in FIG. After acquiring the search window area candidate, the character input completion determining unit 17 similarly acquires a subsequent frame of a frame that is m frames later in time than the target frame (step S20). As described above, since the search window region is displayed at least for a time that can be visually confirmed by the user, the acquisition interval m is performed at an interval smaller than that time.

ここで、シーンチェンジを検出する（ステップＳ２１）。シーンチェンジが検出された場合、現在フレームの検索窓領域を文字入力完了した状態として取得する（ステップＳ２２）。一方、ステップＳ２１において、シーンチェンジが検出されない場合、検索窓領域内に対して、色ヒストグラム分析を行ない、直前フレームと比較する（ステップＳ２３）。ここでは、図１に示す後続フレームヒストグラム情報１５を利用する。ヒストグラムに変化がある場合、まだ文字が入力途中であると判断し、ステップＳ２０に遷移して、さらにｍフレーム後の画像を取得する。画素の出現頻度が変化しなかった場合は文字入力は終了したと判断し、検索窓領域内の画像を取得する（ステップＳ２２）。 Here, a scene change is detected (step S21). When a scene change is detected, the search window area of the current frame is acquired as a state where character input has been completed (step S22). On the other hand, if no scene change is detected in step S21, color histogram analysis is performed on the search window area and compared with the immediately preceding frame (step S23). Here, the subsequent frame histogram information 15 shown in FIG. 1 is used. If there is a change in the histogram, it is determined that characters are still being input, the process proceeds to step S20, and an image after m frames is acquired. If the appearance frequency of the pixel does not change, it is determined that the character input is completed, and an image in the search window area is acquired (step S22).

図１０は、上記各処理の結果、検索窓領域と認められる場合と認められない場合の例を示す図である。なお、線図では表現できないが、領域を多角形で近似した後、その領域内の色相が、検索窓らしい（すなわち、単色の背景上に文字色が存在し、そのヒストグラムが２箇所に偏りを生じる）ことを利用した判定を追加で行なうことも可能である。 FIG. 10 is a diagram illustrating examples of cases where the search window area is recognized as a result of the above-described processes and cases where the search window area is not recognized. Although it cannot be represented by a diagram, after the area is approximated by a polygon, the hue in the area seems to be a search window (that is, there is a character color on a monochromatic background and the histogram is biased in two places. It is also possible to make an additional determination using what occurs.

以上のようにして、検索窓領域内の画像を取得した後、文字認識処理を行なう。すなわち、取得された検索窓領域内の画像に対して、文字認識処理を行ない、キーワードを取得する。この処理には、既存の手法を用いることができる。 As described above, after the image in the search window area is acquired, the character recognition process is performed. That is, a character recognition process is performed on the acquired image in the search window area to acquire a keyword. An existing method can be used for this processing.

また、上記の各処理により、キーワードが抽出された場合、画像処理装置１０は、ユーザに対して検索が可能であることを通知する。この通知は、例えば画面に「Ｉｎｆｏボタンを押すと○○について検索します」と表示する等である。ユーザは、抽出されたキーワードに興味がある場合、Ｉｎｆｏボタンを押すことで、映像受信装置が備えるブラウザを起動させ、簡便にキーワードを用いて検索することができる。 In addition, when a keyword is extracted by each of the above processes, the image processing apparatus 10 notifies the user that the search is possible. This notification is, for example, displayed on the screen as “Search for XX when the Info button is pressed”. If the user is interested in the extracted keyword, the user can activate the browser included in the video reception device by pressing the Info button, and can easily search using the keyword.

以上説明したように、本実施形態によれば、高い精度で映像中の文字入力領域のみを検出することが可能となる。その結果、キーボードを用いずに簡易にキーワード検索を行なうことが可能となる。 As described above, according to the present embodiment, it is possible to detect only the character input area in the video with high accuracy. As a result, keyword search can be easily performed without using a keyboard.

本発明の実施形態に係る画像処理装置の概略構成を示すブロック図である。1 is a block diagram illustrating a schematic configuration of an image processing apparatus according to an embodiment of the present invention. 本発明の実施形態に係る画像処理装置の動作を示すメインフローチャートである。3 is a main flowchart showing the operation of the image processing apparatus according to the embodiment of the present invention. 文字入力領域（以下、「検索窓領域」と呼称する。）の抽出動作を示すフローチャートである。6 is a flowchart showing an extraction operation of a character input area (hereinafter referred to as “search window area”). 文字入力完了判別動作を示すフローチャートである。It is a flowchart which shows character input completion determination operation | movement. 現画像の例を示す図である。It is a figure which shows the example of the present image. 図５Ａから抽出されたエッジ画像の例を示す図である。It is a figure which shows the example of the edge image extracted from FIG. 5A. 多角形近似された結果の例を示す図である。It is a figure which shows the example of the result of polygon approximation. 検索窓領域の形状を示す図である。It is a figure which shows the shape of a search window area | region. 検索窓領域の形状を示す図である。It is a figure which shows the shape of a search window area | region. 検索窓領域の形状を示す図である。It is a figure which shows the shape of a search window area | region. 上記処理によって検出された検索窓領域の例を示す図である。It is a figure which shows the example of the search window area | region detected by the said process. その領域に対してコーナー検出を行なった結果を示す図である。It is a figure which shows the result of having performed the corner detection with respect to the area | region. 検索窓領域内を、垂直方向、水平方向について、それぞれ適当な数に分割した様子を示す図である。It is a figure which shows a mode that the search window area | region was each divided | segmented into the appropriate number about the vertical direction and the horizontal direction. 形状判別処理を概念的に示す図である。It is a figure which shows a shape discrimination process notionally. 上記各処理の結果、検索窓領域と認められる場合と認められない場合の例を示す図である。It is a figure which shows the example when it is not recognized as a search window area | region as a result of each said process.

Explanation of symbols

１０画像処理装置
１１多角形抽出部
１２形状判別部
１３色ヒストグラム算出部
１４文字配置判別部
１６位置判別部
１７文字入力完了判別部
１８文字認識部
１９映像信号入力部 DESCRIPTION OF SYMBOLS 10 Image processing apparatus 11 Polygon extraction part 12 Shape determination part 13 Color histogram calculation part 14 Character arrangement | positioning determination part 16 Position determination part 17 Character input completion determination part 18 Character recognition part 19 Video signal input part

Claims

An image processing apparatus for detecting a video of a character input area included in a video displayed on a screen,
An extraction unit for extracting a video approximate to the shape of the character input area from the video displayed on the screen;
A shape discriminating unit that extracts a video having the shape of the character input area out of the video approximated to the shape of the character input area extracted by the extraction unit;
Of the video having the shape of the character input area extracted by the shape discriminating unit, the character array existing in the video of the character input area approximates the character array to be input to the character input area at the time of keyword search. An image processing apparatus comprising: a character arrangement determining unit that extracts an image having a shape of a character input area that is an array.

The image processing apparatus according to claim 1, wherein the extraction unit extracts a video approximating a shape of a character input area from a video displayed on the screen by performing polygon approximation.

The extraction unit extracts, as a candidate for a character input area, a polygon in which the longest two straight lines are horizontal with respect to the screen from among polygons having four or more vertices among the polygon approximation results. The image processing apparatus according to claim 2.

The extracting unit selects a polygon having the longest two straight lines that are horizontal with respect to the screen from among the polygons having four or more vertices, and the lengths of the two straight lines are equal. The image processing apparatus according to claim 2, wherein the image processing apparatus is extracted as a character input area candidate.

The extraction unit obtains the height of the polygon from the distance between two straight lines horizontal to the screen among the results of the polygon approximation, and the height is the horizontal length of the polygon. The image processing apparatus according to claim 2, wherein an area having a length equal to or less than a predetermined ratio is extracted as a character input area candidate.

The character arrangement determining unit calculates a color histogram for the region obtained by the polygon approximation, determines a background color and a character color from the frequency, and determines the character color appearance frequency from the appearance frequency of the character in the character input region. The image processing apparatus according to claim 2, wherein it is determined whether or not there is any.

The character arrangement determination unit performs corner detection on the area obtained by the polygon approximation, determines the arrangement of characters in the area from the detected corner frequency, and determines the character input area from the character arrangement. The image processing apparatus according to claim 2, wherein it is determined whether or not the image processing is in progress.

Of the video having the shape of the character input area extracted by the shape determination unit, when a plurality of character input areas are extracted, from the ratio of the horizontal length between them and the positional relationship in the screen, The image processing apparatus according to claim 1, further comprising a position determination unit that removes an area other than the character input area.

Of the video having the shape of the character input area extracted by the shape determination unit, when the number of characters in the video of the character input region changes with time, the character input completion determination unit determines whether or not the change in the number of characters has ended. Further comprising
The character arrangement determination unit extracts a video having a shape of a character input area that is an array that approximates an array of characters to be input to the character input area when searching for a keyword when the change in the number of characters ends. The image processing apparatus according to claim 1 or 2.

The video having the shape of the character input area extracted by the character arrangement determination unit is output to a character recognition unit that performs character recognition, and keyword search is performed using the characters recognized by the character recognition unit. The image processing apparatus according to any one of claims 1 to 9.

An image processing method for detecting a video of a character input area included in a video displayed on a screen,
In the extraction unit, extracting a video approximate to the shape of the character input area from the video displayed on the screen by performing polygon approximation;
A step of extracting a video having the shape of the character input area from the video approximating the shape of the character input area extracted by the extraction unit in the shape determination unit;
In the character input completion determination unit, when the number of characters in the image of the character input region changes with time among the images having the shape of the character input region extracted by the shape determination unit, it is determined whether or not the change in the number of characters has ended. A step of determining;
In the character arrangement determination unit, when the change in the number of characters is finished, an arrangement of characters existing in the image of the character input area among the images having the shape of the character input area extracted by the shape determination unit is a keyword search. Extracting a video having a shape of a character input area that is an array that approximates an array of characters that are sometimes input to the character input area;
Outputting at least an image having the shape of the character input area extracted by the character arrangement determining unit.

A program for detecting a video of a character input area included in a video displayed on a screen,
Processing to extract a video approximate to the shape of the character input area from the video displayed on the screen by performing polygon approximation;
A process of extracting an image having the shape of a character input area from among the images approximating the shape of the extracted character input area;
Among the images having the extracted character input area shape, when the number of characters in the image of the character input area changes with time, a process of determining whether or not the change in the number of characters has ended,
When the change in the number of characters is completed, an array of characters existing in the video of the character input area among the extracted video of the character input area should be input to the character input area at the time of keyword search A process of extracting an image having the shape of a character input area that is an array that approximates the character array;
A program for outputting a video having the shape of the extracted character input area and a series of processes to be read and executed by a computer as a command.