JP2006157108A

JP2006157108A - Video recording / playback device

Info

Publication number: JP2006157108A
Application number: JP2004340141A
Authority: JP
Inventors: Tetsuya Hayashi; 林　　哲也
Original assignee: Teac Corp
Current assignee: Teac Corp
Priority date: 2004-11-25
Filing date: 2004-11-25
Publication date: 2006-06-15

Abstract

【課題】より簡易に所望のシーン検索ができ得る映像記録再生装置を提供する。
【解決手段】映像記録再生装置１０は、外部機器３４から入力された映像信号を記録媒体１４に記録する。記録した映像のうち、所定のキーワードが発生した箇所にはインデックスが付与される。キーワードの発生の有無、および、発生量は、キーワード発生量検出部１６により検出され、その検出結果はレベル判定部２４に出力される。レベル判定部２４は、キーワード発生量が所定の基準値以上の場合にのみ、当該キーワードの発生箇所に付与するインデックス情報を生成し、記録部１２に出力する。記録部１２は、生成されたインデックス情報を映像信号とともに記録媒体に１４に記録する。
【選択図】図１
A video recording / reproducing apparatus capable of more easily searching for a desired scene is provided.
A video recording / reproducing apparatus 10 records a video signal input from an external device 34 on a recording medium 14. Of the recorded video, an index is assigned to a portion where a predetermined keyword occurs. The presence / absence and generation amount of the keyword are detected by the keyword generation amount detection unit 16, and the detection result is output to the level determination unit 24. The level determination unit 24 generates index information to be assigned to the occurrence location of the keyword and outputs it to the recording unit 12 only when the keyword generation amount is equal to or greater than a predetermined reference value. The recording unit 12 records the generated index information on the recording medium 14 together with the video signal.
[Selection] Figure 1

Description

本発明は、映像を記録媒体に記録、および、記録媒体に記録された映像を再生する映像記録再生装置に関する。 The present invention relates to a video recording / reproducing apparatus that records video on a recording medium and reproduces the video recorded on the recording medium.

映像を記録、再生する映像記録再生装置の中には、記録する映像の中で、予め設定されたキーワードが発生する箇所を検出し、当該キーワードの発生箇所にインデックスを付与するものがある。 Some video recording / playback apparatuses that record and play back video detect a location where a preset keyword occurs in the video to be recorded, and add an index to the location where the keyword occurs.

例えば、特許文献１には、映像に含まれる音声を認識し、認識された音声からキーワードを抽出し、当該キーワードの発生を示すインデックスを、映像とともに記録媒体に記録する映像処理装置が開示されている。また、特許文献２には、録画しようとする番組で特有な言葉を抽出して、映像検索のためのインデックスを作成し、この映像検索インデックスを映像信号及び音声信号と共に記録媒体に記録するシーン検索装置が開示されている。 For example, Patent Document 1 discloses a video processing apparatus that recognizes audio included in video, extracts a keyword from the recognized audio, and records an index indicating the occurrence of the keyword on a recording medium together with the video. Yes. Patent Document 2 also discloses a scene search in which words specific to a program to be recorded are extracted, an index for video search is created, and the video search index is recorded on a recording medium together with a video signal and an audio signal. An apparatus is disclosed.

このように特定のキーワードの出現箇所にインデックスを付与することにより、所望のシーンを簡易に検索することが可能となる。すなわち、自己が興味のある用語や人名をキーワードとして登録しておけば、そのインデックスの付与状況を見ることにより、記録媒体に記録した映像全てを鑑賞しなくても、当該用語または人名に関連のあるシーンを検索できる。 Thus, by assigning an index to the appearance location of a specific keyword, a desired scene can be easily searched. In other words, if you register a term or person's name that you are interested in as a keyword, you can view the indexing status and see the related situation for that term or person's name without having to watch all the videos recorded on the recording medium. You can search for a scene.

特開２００２−１７１４８１号公報JP 2002-171481 A 特開２０００−２３６４９４号公報JP 2000-236494 A

しかし、上記の技術は、全て、「キーワードの有無」に基づいてインデックスの付与の可否を判断している。したがって、キーワードが１回しか発生せず、キーワードとあまり関係のないシーンにもインデックスが付与されてしまうことになる。その結果、インデックスの付与状況からは、どこが所望のシーンであるかが、明確に判断できない場合がある。また、１回だけの発生でもインデックスを付与すると、インデックスの数が多くなりやすい。通常、ユーザは、インデックス付与箇所に係る映像を鑑賞して、当該シーンがキーワードと関係あるシーンであるか否かを判断する。しかし、インデックス付与箇所が多くなれば、鑑賞シーンも多くなり、結果として、シーンの検索効率を下げてしまう。 However, all of the above techniques determine whether or not an index can be assigned based on “the presence / absence of a keyword”. Therefore, the keyword is generated only once, and an index is given to a scene that is not related to the keyword. As a result, it may not be possible to clearly determine where the desired scene is based on the index assignment status. Also, if an index is assigned even if it occurs only once, the number of indexes tends to increase. Usually, the user views the video related to the indexed portion and determines whether or not the scene is a scene related to the keyword. However, as the number of indexed portions increases, the number of viewing scenes increases, and as a result, the scene search efficiency decreases.

また、上記技術では、単位時間内に１回キーワードが発生したシーンと、複数回キーワードが発生したシーンとで、同じインデックスが付与される。したがって、インデックスの付与状況を見ただけでは、どのシーンがキーワードと関連深いシーンかが判断できない。 In the above technique, the same index is assigned to a scene in which a keyword occurs once within a unit time and a scene in which a keyword occurs multiple times. Therefore, it is not possible to determine which scene is closely related to the keyword only by looking at the index assignment status.

そこで、本発明では、より簡易に所望のシーンの検索ができ得る映像記録再生装置を提供することを目的とする。 Accordingly, an object of the present invention is to provide a video recording / playback apparatus that can more easily search for a desired scene.

本発明の映像記録再生装置は、映像を記録媒体に記録する映像記録再生装置であって、予め、ユーザから指定されたキーワードを記憶する記憶部と、入力された映像を解析して、所定の単位時間ごとのキーワードの発生量を検出する検出手段と、所定の単位時間ごとにキーワードの発生量が基準値以上か否かを判断する判断手段と、キーワードの発生量が基準値以上の場合に、キーワードの発生を示すインデックス情報を、当該キーワードの発生した位置情報とともに記録媒体に記録するインデックス記録手段と、を有することを特徴とする。 The video recording / reproducing apparatus of the present invention is a video recording / reproducing apparatus for recording a video on a recording medium, in which a storage unit for storing a keyword designated by a user in advance and an input video are analyzed to obtain a predetermined A detecting means for detecting a keyword generation amount per unit time, a determination means for determining whether or not a keyword generation amount is greater than or equal to a reference value every predetermined unit time, and a keyword generation amount being greater than or equal to a reference value And index recording means for recording the index information indicating the occurrence of the keyword on the recording medium together with the position information where the keyword is generated.

好適な態様では、検出手段は、所定時間内でのキーワードの発生回数に基づいて発生量を算出する。キーワードが音声で発生した場合には、検出手段は、キーワードの発生回数と、各発生時でのキーワードの音量と、に基づいて発生量を算出することが望ましい。キーワードが文字で発生した場合には、検出手段は、発生したキーワードの色、表示時間、画面占有率のうちの少なくとも一つと、キーワードの発生回数と、に基づいて発生量を算出することが望ましい。 In a preferred aspect, the detection means calculates the generation amount based on the number of occurrences of the keyword within a predetermined time. When the keyword is generated by voice, it is desirable that the detection means calculates the generation amount based on the number of times the keyword is generated and the volume of the keyword at each occurrence. When the keyword is generated as a character, it is preferable that the detection unit calculates the generation amount based on at least one of the generated keyword color, display time, and screen occupancy rate, and the number of occurrences of the keyword. .

他の好適な態様では、インデックス記録手段は、インデックス情報として、キーワードの発生量も記録する。さらに、記録媒体に記録されたインデックス情報に基づいて、キーワードの発生箇所と、その発生量と、を示すインデックス表示画面をユーザに提示する提示手段を備えることが望ましい。さらに、インデックス表示画面上で、記録媒体に記録された映像の再生開始位置を指示できることが望ましい。 In another preferred aspect, the index recording means also records a keyword generation amount as index information. Further, it is desirable to provide a presentation means for presenting an index display screen showing a keyword occurrence location and the amount of occurrence based on the index information recorded on the recording medium to the user. Furthermore, it is desirable that the playback start position of the video recorded on the recording medium can be indicated on the index display screen.

他の本発明である映像記録再生装置は、映像を記録媒体に記録する映像記録再生装置であって、予め、ユーザから指定されたキーワードを記憶する記憶部と、入力された映像を解析して、所定の単位時間ごとのキーワードの発生量を検出する検出手段と、キーワードの発生位置と発生量とを示すインデックス情報を、当該キーワードの発生位置と関連づけて記録媒体に記録するインデックス記録手段と、記録媒体に記録されたインデックス情報に基づいて、キーワードの発生箇所と、その発生量と、を示すインデックス表示画面をユーザに提示する提示手段と、を有することを特徴とする。 Another video recording / reproducing apparatus according to the present invention is a video recording / reproducing apparatus for recording a video on a recording medium, in which a storage unit for storing a keyword designated by a user in advance and an input video are analyzed. Detecting means for detecting a keyword generation amount per predetermined unit time; index recording means for recording index information indicating a keyword generation position and generation amount on a recording medium in association with the keyword generation position; The present invention is characterized in that it has presentation means for presenting an index display screen showing a keyword occurrence location and the generation amount to the user based on the index information recorded on the recording medium.

他の本発明である映像記録再生装置は、複数のシーンから構成される映像を記録媒体に記録する映像記録再生装置であって、予め、ユーザから指定されたキーワードを記憶する記憶部と、入力された映像を解析して、所定の単位時間ごとのキーワードの発生量を検出する検出手段と、所定の単位時間ごとにキーワードの発生量が基準値以上か否かを判断する判断手段と、基準値以上の場合に、当該キーワードの発生シーンにかかる映像を記録媒体に記録する映像記録手段と、を有することを特徴とする。 Another video recording / reproducing apparatus according to the present invention is a video recording / reproducing apparatus for recording a video composed of a plurality of scenes on a recording medium, a storage unit for storing a keyword designated in advance by a user, and an input Analyzing the recorded video to detect a keyword generation amount per predetermined unit time, a determination unit determining whether the keyword generation amount exceeds a reference value every predetermined unit time, a reference Video recording means for recording a video related to the generation scene of the keyword on a recording medium when the value is greater than or equal to the value;

他の本発明である映像記録再生装置は、記録媒体に記録された映像を再生する映像記録再生装置であって、予め、ユーザから指定されたキーワードを記憶する記憶部と、記録媒体に記録された映像を解析して、所定の単位時間ごとのキーワードの発生量を検出する検出手段と、所定の単位時間ごとにキーワードの発生量が基準値以上か否かを判断する判断手段と、キーワードの発生量が基準値未満の映像を、記録媒体から削除する映像削除手段と、を有することを特徴とする。
ここで、映像とは、音声および動画像から構成される信号を指す。また、記録媒体としては、ＤＶＤやＢｌｕ−ｒａｙ、ＨＤ−ＤＶＤなどの光ディスクの他、ハードディスクなど、デジタル映像信号が記録可能な記録媒体全般を指す。 Another video recording / playback apparatus according to the present invention is a video recording / playback apparatus for playing back video recorded on a recording medium, and is recorded in advance on a storage unit for storing a keyword designated by a user, and on the recording medium. Detecting means for analyzing the generated video and detecting a keyword generation amount per predetermined unit time, a determination means for determining whether or not the keyword generation amount per predetermined unit time is greater than or equal to a reference value, Video deletion means for deleting a video whose generated amount is less than a reference value from a recording medium.
Here, the video refers to a signal composed of audio and moving images. The recording medium refers to all recording media capable of recording digital video signals, such as hard disks, as well as optical disks such as DVD, Blu-ray, and HD-DVD.

本発明によれば、所定の基準値以上のキーワード発生量がある箇所にのみインデックスが付与される。換言すれば、確実にキーワードとの関連性が高い箇所にのみインデックスが付与される。その結果、インデックスの付与状況を見れば、簡易に、所望のシーンを検索できる。 According to the present invention, an index is assigned only to a portion where there is a keyword generation amount equal to or greater than a predetermined reference value. In other words, an index is assigned only to a portion that is reliably related to the keyword. As a result, a desired scene can be easily retrieved by looking at the index assignment status.

以下、本発明の実施形態について図面を参照して説明する。図１に本発明の実施形態である映像記録再生装置１０の概略ブロック図を示す。この映像記録再生装置１０は、ＤＶＤやＢｌｕ−ｒａｙ、ＨＤ−ＤＶＤなどの光ディスクや、ハードディスク等の記録媒体１４に映像を記録、再生する装置である。したがって、ＤＶＤレコーダやハードディスクレコーダなどが該当する。この映像記録再生装置１０は、記録した映像に対して、インデックスを付与することができる。インデックスとは、記録した映像の中で、特に着目したい箇所に付される目印である。本実施形態では、予め設定されたキーワードが発生した箇所にインデックスが付されるようになっている。以下、この映像記録再生装置１０について詳説する。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a schematic block diagram of a video recording / reproducing apparatus 10 according to an embodiment of the present invention. The video recording / reproducing apparatus 10 is an apparatus that records and reproduces video on a recording medium 14 such as an optical disk such as a DVD, Blu-ray, or HD-DVD, or a hard disk. Therefore, a DVD recorder, a hard disk recorder, etc. correspond. The video recording / reproducing apparatus 10 can add an index to the recorded video. An index is a mark attached to a portion of the recorded video where attention is particularly needed. In the present embodiment, an index is attached to a place where a preset keyword is generated. Hereinafter, the video recording / reproducing apparatus 10 will be described in detail.

外部機器３４は、例えば、テレビジョン受像機や、他の映像再生装置、テレビチューナなど、何らかの映像信号の出力が可能な機器全般が該当する。この外部機器３４から出力された映像信号は、記録部１２およびキーワード発生量検出部１６に送られる。記録部１２は、入力された映像信号、すなわち、動画像信号および音声信号を記録媒体１４に記録する。動画像信号および音声信号は、必要に応じて、Ａ／Ｄ変換ＭＰＥＧエンコード処理がなされた後に記録部１２に入力される。また、記録部１２は、後述するインデックス生成部で生成されたインデックス情報も記録媒体１４に記録する。 The external device 34 corresponds to all devices capable of outputting some video signal, such as a television receiver, another video playback device, and a TV tuner. The video signal output from the external device 34 is sent to the recording unit 12 and the keyword generation amount detection unit 16. The recording unit 12 records the input video signal, that is, the moving image signal and the audio signal on the recording medium 14. The moving image signal and the audio signal are input to the recording unit 12 after A / D conversion MPEG encoding processing, if necessary. The recording unit 12 also records index information generated by an index generation unit described later on the recording medium 14.

キーワード発生量検出部１６は、入力された映像信号を、所定の単位時間（例えば、１分など）ごとのシーンに分割し、各シーンごとのキーワードの発生の有無およびその発生量を検出する。映像信号は、映像記録の際に外部機器から入力される他、後述する再生部３０により再生された映像信号が入力される場合もある。キーワードは、予め、ユーザにより設定されており、ＲＡＭなどから構成されるキーワード記憶部２６に記憶されている。 The keyword generation amount detection unit 16 divides the input video signal into scenes for each predetermined unit time (for example, 1 minute), and detects whether or not a keyword is generated for each scene and the generation amount thereof. The video signal may be input from an external device during video recording, or may be input as a video signal reproduced by the reproduction unit 30 described later. The keywords are set in advance by the user, and are stored in the keyword storage unit 26 including a RAM or the like.

キーワードの検出は、音声認識部および画像認識部により行われる。音声認識部１８は、入力された音声信号を解析し、入力された音声の中にキーワードを示す音声が含まれるか否かを判断する。キーワードを示す音声が含まれている場合は、その認識結果、および、キーワードに係る音声信号の音量をレベル判定部２２に出力する。 The keyword is detected by the voice recognition unit and the image recognition unit. The voice recognition unit 18 analyzes the input voice signal and determines whether or not a voice indicating a keyword is included in the input voice. When the voice indicating the keyword is included, the recognition result and the volume of the voice signal related to the keyword are output to the level determination unit 22.

画像認識部２０は、入力された動画像信号を解析し、その動画像信号の中に、キーワードを示す文字情報が含まれているか否かを判断する。この画像解析の手法については、後に詳説する。キーワードを示す文字情報が含まれていると判断した場合は、その文字情報の表示時間、色、画面占有率などを算出し、レベル判定部２２に出力する。 The image recognition unit 20 analyzes the input moving image signal and determines whether or not character information indicating a keyword is included in the moving image signal. This image analysis method will be described in detail later. When it is determined that character information indicating a keyword is included, the display time, color, screen occupancy rate, and the like of the character information are calculated and output to the level determination unit 22.

レベル判定部２２は、音声認識部１８および画像認識部２０から出力された認識結果に基づいて、当該シーンにおけるキーワードの発生量を算出し、その発生量に基づいて当該シーンのレベルを判定する。発生量の算出方法については後に詳説するが、基本的には、キーワードの発生頻度を中心として算出される。そして、得られた発生量に基づいて、各シーンのレベルを判定する。レベルは、各シーンでのキーワードの発生量を示すパラメータで、数値が大きいほどキーワード発生量が多いことを示す。なお、キーワードが複数設定されている場合は、各キーワードごとに、発生量、および、レベルを判定する。レベル判定結果、音声認識結果、および、画像認識結果は、インデックス生成部２４に出力される。 Based on the recognition results output from the voice recognition unit 18 and the image recognition unit 20, the level determination unit 22 calculates a keyword generation amount in the scene, and determines the scene level based on the generation amount. The generation amount calculation method will be described in detail later. Basically, the generation amount is calculated based on the keyword occurrence frequency. Then, the level of each scene is determined based on the obtained generation amount. The level is a parameter indicating the amount of keywords generated in each scene. The larger the numerical value, the larger the amount of keywords generated. If a plurality of keywords are set, the generation amount and level are determined for each keyword. The level determination result, the voice recognition result, and the image recognition result are output to the index generation unit 24.

インデックス生成部２４は、キーワードの発生したシーンに対して付与するインデックス情報を生成する。インデックス情報生成の可否は、各シーンのレベルに基づいて判断される。対象シーンのレベル、すなわち、キーワード発生量が所定の基準値以下の場合、当該シーンについてはインデックスを付与しないため、インデックス情報の生成を行わない。一方、レベルが基準値以上の場合は、当該シーンのインデックス情報を生成し、記録部１２に出力する。インデックス情報としては、キーワードの発生位置（シーン）、各キーワードの発生量、発生態様（音声で発生したか、文字で発生したか、など）などが含まれる。生成された情報は、記録部１２に出力され、当該シーンの映像信号と関連付けて、記録媒体に記録される。 The index generation unit 24 generates index information to be given to a scene where a keyword has occurred. Whether or not index information can be generated is determined based on the level of each scene. If the level of the target scene, that is, the amount of generated keywords is equal to or less than a predetermined reference value, no index information is generated because no index is assigned to the scene. On the other hand, if the level is equal to or higher than the reference value, the index information of the scene is generated and output to the recording unit 12. The index information includes a keyword generation position (scene), a generation amount of each keyword, a generation mode (whether it is generated by voice or characters), and the like. The generated information is output to the recording unit 12 and recorded on the recording medium in association with the video signal of the scene.

再生部３０は、記録媒体１４に記録された映像信号を再生する。再生された映像信号は、通常、テレビモニタなどの表示器３６に出力される。また、既に、記録媒体１４に記録された映像に対してインデックスを付与する場合は、再生された映像信号をキーワード発生量検出部１６に出力する。 The reproduction unit 30 reproduces the video signal recorded on the recording medium 14. The reproduced video signal is usually output to a display 36 such as a television monitor. In addition, when an index is already assigned to the video recorded on the recording medium 14, the reproduced video signal is output to the keyword generation amount detection unit 16.

また、再生部３０は、記録媒体１４に記録されたインデックス情報の再生も可能となっており、再生されたインデックス情報はインデックス画面生成部３２に出力される。インデックス画面生成部３２は、インデックス情報に基づいてインデックスの発生状況を示すインデックス画面を生成する。インデックス画面は、キーワードの発生箇所、発生量などがタイムバー、棒グラフなどを用いて視覚的に表現された画面である。生成されたインデックス画面は、表示器３６に出力される。 The reproducing unit 30 can also reproduce the index information recorded on the recording medium 14, and the reproduced index information is output to the index screen generating unit 32. The index screen generation unit 32 generates an index screen indicating an index generation status based on the index information. The index screen is a screen in which keyword occurrence locations, generation amounts, and the like are visually expressed using time bars, bar graphs, and the like. The generated index screen is output to the display 36.

制御部２８は、ユーザインターフェース（Ｕ／Ｉ）２９を介して入力されたユーザの指示に応じて、映像記録再生装置１０全体を制御する。ユーザインターフェース２９としては、リモコンやスイッチなどが該当し、ユーザは、これらリモコンやスイッチを操作して映像記録再生装置１０を操作する。制御部２８は、ユーザからの指示に応じて、映像の記録、再生、インデックス表示画面の表示、などを行う。 The control unit 28 controls the entire video recording / reproducing apparatus 10 in accordance with a user instruction input via the user interface (U / I) 29. The user interface 29 corresponds to a remote controller or a switch, and the user operates the video recording / reproducing apparatus 10 by operating these remote controller or switch. The control unit 28 performs video recording, reproduction, index display screen display, and the like in accordance with an instruction from the user.

次に、キーワードの検出について詳説する。キーワードは、音声として発生する場合と、文字（画像）として発生する場合と、がある。音声で発生したキーワードの検出には、周知の種々の音声認識技術を用いることができる。例えば、音声認識部１８は、入力された音声信号の周波数等を算出し、当該音声信号の音声特徴パターンを取得する。続いて、この得られた音声特徴パターンと、予め記録されているキーワードの音声特徴パターンと、を比較し、両者が一致または近似している場合には、当該音声信号がキーワードであると認識する。当該音声信号がキーワードであった場合には、当該音声信号の音量を検出する。そして、その認識結果と、音量と、をレベル判定部２２に出力する。 Next, keyword detection will be described in detail. There are cases where the keyword is generated as a voice and when the keyword is generated as a character (image). Various known voice recognition techniques can be used to detect keywords generated by voice. For example, the voice recognition unit 18 calculates the frequency or the like of the input voice signal and acquires the voice feature pattern of the voice signal. Subsequently, the obtained voice feature pattern is compared with a keyword voice feature pattern recorded in advance, and if both match or approximate, the voice signal is recognized as a keyword. . If the audio signal is a keyword, the volume of the audio signal is detected. Then, the recognition result and the volume are output to the level determination unit 22.

文字（画像）で発生したキーワードは、画像認識部２０により検出される。この検出には周知の種々の画像解析技術、例えば、パターンマッチングなどを用いることができる。パターンマッチングは、入力された画像について求めた特徴パターンと、予め記憶された基準画像の特徴パターンと、を比較することにより、両者の一致性を判定する画像解析技術の一つである。このパターンマッチングを用いた具体的な解析方法について図２を用いて説明する。図２（ａ）は入力された画像（以下、「対象画像４０」という）の一例を示す図であり、図２（ｂ）はＲＡＭなどから構成されるキーワード記憶部２６に記憶された基準文字画像４２を示す図である。 Keywords generated in characters (images) are detected by the image recognition unit 20. For this detection, various known image analysis techniques such as pattern matching can be used. Pattern matching is one of image analysis techniques for determining the matching between a feature pattern obtained for an input image and a feature pattern of a reference image stored in advance. A specific analysis method using this pattern matching will be described with reference to FIG. FIG. 2A is a diagram showing an example of an input image (hereinafter referred to as “target image 40”), and FIG. 2B is a reference character stored in the keyword storage unit 26 including a RAM or the like. It is a figure which shows the image.

画像認識部２０は、まず、対象画像４０についてエッジ検出を行う。エッジ検出は、各画素の輝度値を求める。そして、隣接する画素との輝度差が所定値以上となれば、当該画素をエッジとして抽出する。このエッジ検出により対象画像４０の輪郭が取得できる。次に、この輪郭のうち、角部にあたる画素を特徴点４０ａとして抽出する。例えば、文字「キ」の画像であれば、特徴点４０ａは、図２（ａ）に示すように２０点取得できる。次に、対象画像４０の特徴点４０ａと、キーワード記憶部２６に記憶されている基準文字画像４２の特徴点４２ａと、を比較する。この比較は、次の手順で行われる。まず、複数ある特徴点４０ａ，４２ａのうち、基準となる基準点４０ｂ，４２ｂを、対象画像４０、基準文字画像４２それぞれで求める。次に、この基準点４０ｂ，４２ｂが一致するように対象画像と基準文字画像とを重ねる。そして、基準点４０ｂ，４２ｂを一致させた状態で、対象画像４０の特徴点４０ａと、基準文字画像４２の特徴点４２ａと、のズレ量が最小となるように、対象画像４０または基準文字画像４２のサイズおよび角度を調整する。特徴点４０ａ，４２ａのズレ量が最小となれば、その際のズレ量が所定の基準値未満か否かを判断する。ズレ量が基準値未満の場合、対象画像４０は、基準文字画像４２が示す文字と判断する。 First, the image recognition unit 20 performs edge detection on the target image 40. In edge detection, the luminance value of each pixel is obtained. If the luminance difference between adjacent pixels is equal to or greater than a predetermined value, the pixel is extracted as an edge. By this edge detection, the contour of the target image 40 can be acquired. Next, a pixel corresponding to a corner portion is extracted as a feature point 40a from the outline. For example, in the case of an image of the character “ki”, 20 feature points 40a can be acquired as shown in FIG. Next, the feature point 40 a of the target image 40 is compared with the feature point 42 a of the reference character image 42 stored in the keyword storage unit 26. This comparison is performed in the following procedure. First, of the plurality of feature points 40a and 42a, reference points 40b and 42b serving as a reference are obtained for the target image 40 and the reference character image 42, respectively. Next, the target image and the reference character image are overlaid so that the reference points 40b and 42b coincide. Then, in a state where the reference points 40b and 42b are made coincident with each other, the target image 40 or the reference character image is set so that the deviation amount between the feature point 40a of the target image 40 and the feature point 42a of the reference character image 42 is minimized. Adjust the size and angle of 42. If the amount of deviation between the feature points 40a and 42a is minimized, it is determined whether or not the amount of deviation at that time is less than a predetermined reference value. When the amount of deviation is less than the reference value, the target image 40 is determined to be a character indicated by the reference character image 42.

画像認識部２０は、このようなパターンマッチングなどの画像解析により、入力された動画像信号に含まれる文字を抽出し、キーワードを示す文字列の有無を検出する。このとき、カタカナの「ロ」と図形の四角などのように類似形状の文字と図形とを間違えないように、文字の連続性などでキーワードの有無を判断する。そして、キーワードを示す文字列が含まれている場合は、当該文字列の画面占有率、表示時間、および色も検出し、レベル判定部２２に出力する。 The image recognition unit 20 extracts characters included in the input moving image signal through image analysis such as pattern matching, and detects the presence or absence of a character string indicating a keyword. At this time, the presence / absence of a keyword is determined based on the continuity of characters so as not to make a mistake between a character with a similar shape such as “B” in katakana and a square of the graphic. If a character string indicating a keyword is included, the screen occupation rate, display time, and color of the character string are also detected and output to the level determination unit 22.

ここで、画面占有率は、画面に対するキーワードの占有面積の比率である。キーワードの占有面積は、キーワード全体が入る最小の矩形の面積である。例えば、図３に示すような画像において、キーワード５０が「ＡＡＡＡ」であった場合、キーワード５０の占有面積は図３の破線で示す矩形５２の面積である。画面占有率は、画面全体の面積に対する、この矩形５２の面積である。例えば、画面が７２０×４８０ドットであり、キーワード占有面積（矩形５２の面積）が２１６×４６ドットの場合、画面占有率は、（２１６×４６）／（７２０×４８０）×１００％≒３％となる。 Here, the screen occupation ratio is the ratio of the area occupied by the keyword to the screen. The area occupied by the keyword is the smallest rectangular area that can contain the entire keyword. For example, in the image shown in FIG. 3, when the keyword 50 is “AAAA”, the occupied area of the keyword 50 is the area of the rectangle 52 indicated by the broken line in FIG. The screen occupation ratio is the area of the rectangle 52 with respect to the area of the entire screen. For example, when the screen is 720 × 480 dots and the keyword occupation area (area of the rectangle 52) is 216 × 46 dots, the screen occupation ratio is (216 × 46) / (720 × 480) × 100% ≈3% It becomes.

また、画像認識部２０は、キーワードの色も求めてレベル判定部２２に出力する。キーワードの色は、ＲＧＢなどのカラーモデルで数値として出力する。このとき、一つのキーワードに複数の色が含まれている場合、各色のＲＧＢ値を出力する。例えば、「ＤＶＤレコーダ」というキーワードのうち「ＤＶＤ」は赤色、「レコーダ」が黒色であった場合は、赤色を示すＲＧＢ値、黒色を示すＲＧＢ値の両方をレベル判定部２２に出力する。 The image recognition unit 20 also obtains the keyword color and outputs it to the level determination unit 22. The keyword color is output as a numerical value using a color model such as RGB. At this time, when a plurality of colors are included in one keyword, the RGB value of each color is output. For example, when “DVD” is red in the keyword “DVD recorder” and “recorder” is black, both the RGB value indicating red and the RGB value indicating black are output to the level determination unit 22.

なお、ここで説明したキーワードの検出方法は一例であり、キーワードの有無等が検出できるのであれば、当然、他の手法を用いてもよい。また、本実施形態では、キーワードの有無の他、音量や表示時間、画面占有率、色なども検出しているが、必ずしも検出する必要はない。また、本実施形態では、音声での発生、および、画像での発生の両方を検出しているが、いずれか一方のみを検出するようにしてもよい。さらに、設定されたキーワードの他、このキーワードの別称や類義語なども、キーワードとして検出するようにしてもよい。例えば、キーワードが「ＤＶＤ」（ディーブイディー）である場合は、「ＤＶＤ」の他、「デジタルビデオディスク」という語もキーワードとして検出するようにしてもよい。この場合、各用語の別称は、ユーザ自身が登録するようにしてもよいし、予め、辞書として持っていてもよい。 Note that the keyword detection method described here is merely an example, and other methods may naturally be used as long as the presence or absence of the keyword can be detected. In this embodiment, in addition to the presence / absence of a keyword, the volume, display time, screen occupancy, color, and the like are also detected, but it is not always necessary to detect them. In the present embodiment, both sound generation and image generation are detected, but only one of them may be detected. Furthermore, in addition to the set keyword, an alias or a synonym of the keyword may be detected as a keyword. For example, when the keyword is “DVD”, the word “digital video disc” may be detected as a keyword in addition to “DVD”. In this case, the nickname of each term may be registered by the user himself or may be previously stored as a dictionary.

次に、レベル判定部２２でのレベル判定について詳説する。レベル判定部２２には、既述したように音声認識部１８および画像認識部２０によるキーワードの検出結果、音量、画面占有率などが入力される。レベル判定部２２は、これらの情報に基づいて、まず、各シーンごとのキーワードの発生量を算出する。キーワードの発生量は、キーワードの発生回数を基準とし、これに、音量や画面占有率に基づく重み係数を乗算して求める。 Next, the level determination in the level determination unit 22 will be described in detail. As described above, the level determination unit 22 receives the keyword detection results, volume, screen occupancy, and the like by the voice recognition unit 18 and the image recognition unit 20. Based on these pieces of information, the level determination unit 22 first calculates the keyword generation amount for each scene. The amount of keywords generated is determined by multiplying the number of keyword occurrences by a weighting factor based on the volume and screen occupancy.

具体的には、キーワードが音声で発生した場合のキーワード発生量Ｄａは、キーワードの発生回数と各発生時での音量に基づく係数とで算出される。例えば、１シーン中にキーワードがｎ回発生し、各発生時での音量がＡｉ（ｉ＝１，２，・・・ｎ）とする。このときのキーワード発生量Ｄａは、キーワードの発生ごとに加算される基準値Ｂ（例えば、１など）と、音量Ａｉに基づく係数Ｋｉａとの積和となる。すなわち、キーワード発生量Ｄａは、Ｄａ＝ΣＢ×Ｋｉａとなる。音量係数Ｋｉａの設定値としては種々の態様が考えられるが、本実施形態では、音量が所定の第一基準値（例えば、２４ｄＢなど）未満の場合は０、第一基準値以上かつ第二基準値（例えば、６６ｄＢなど）未満の場合は１、第二基準値以上の場合は１．５のように設定されている。つまり、音量が第一基準値未満の場合はキーワード発生量Ｄａは０となり、キーワード発生が無かったものとして処理される。逆に、キーワードが大音量で発生した場合は、キーワード発生量Ｄａの値も大きくなるようになっている。これは、キーワードが低音量で発生した場合、そのシーンは当該キーワードとの関係性が小さいことが多く、そのようなシーンにインデックスを付与することは望ましくないからである。一方、キーワードが大音量で発生した場合、そのシーンは当該キーワードとの関係性が大きいことが多く、当該シーンに、インデックスがより付与されやすい状態にすることが望ましいからである。 Specifically, the keyword generation amount Da when the keyword is generated by voice is calculated from the number of occurrences of the keyword and a coefficient based on the sound volume at each generation. For example, a keyword occurs n times in one scene, and the volume at each occurrence is Ai (i = 1, 2,... N). The keyword generation amount Da at this time is a product sum of a reference value B (for example, 1) added every time the keyword is generated and a coefficient Kia based on the volume Ai. That is, the keyword generation amount Da is Da = ΣB × Kia. Various settings are possible as the set value of the volume coefficient Kia. In this embodiment, when the volume is less than a predetermined first reference value (for example, 24 dB), 0, the first reference value or more and the second reference The value is set to 1 when the value is smaller than the value (for example, 66 dB), and 1.5 when the value is equal to or larger than the second reference value. That is, when the volume is less than the first reference value, the keyword generation amount Da is 0, and processing is performed assuming that no keyword is generated. Conversely, when the keyword is generated at a high volume, the keyword generation amount Da is also increased. This is because when a keyword occurs at a low volume, the scene often has a small relationship with the keyword, and it is not desirable to add an index to such a scene. On the other hand, when a keyword occurs at a high volume, the scene often has a large relationship with the keyword, and it is desirable to make the scene more easily indexed.

キーワードが文字で発生した場合のキーワード発生量Ｄｖは、キーワードの発生ごとに加算される基準値Ｂ（例えば、１など）、各発生時での表示時間に基づく重み係数Ｋｉｂ（以下、「時間係数」という）、画面占有率に基づく重み係数Ｋｉｃ（以下、「占有率係数」という）、色に基づく重み係数Ｋｉｄ（以下、「色係数」という）、の積和で算出される。すなわち、Ｄｖ＝ΣＢ×Ｋｉｂ×Ｋｉｃ×Ｋｉｃとなる。時間係数Ｋｉｂは、表示時間が所定の第一基準値（例えば、１ｓｅｃ）未満の場合は０、第一基準値以上かつ第二基準値（例えば、３ｓｅｃ）未満の場合は１、第二基準値以上の場合は１．５などのように設定されている。また、占有係数Ｋｉｃは、画面占有率が所定の第一基準値（例えば、２％）未満の場合は０、第一基準値以上かつ第二基準値（例えば、５％）未満の場合は１、第二基準値以上の場合は１．５などのように設定されている。色係数Ｋｉｄは、黒（ＲＧＢ＝０，０，０）の場合は１、その他の色の場合は１．５などのように設定されている。このような係数を用いることにより、キーワードの文字列が瞬間しか表示されなかった場合や、非常に小さく表示された場合は、キーワードは発生しなかったものとして処理される。逆に、キーワードが長時間表示された場合、大きく表示された場合、色文字で表示された場合は、キーワード発生量Ｄｖの値が大きくなり、よりインデックスが付与されやすくなる。 The keyword generation amount Dv when the keyword is generated in characters is a reference value B (for example, 1 or the like) added every time the keyword is generated, and a weighting factor Kib (hereinafter referred to as “time factor” based on the display time at each occurrence. )), A weighting factor Kic based on the screen occupancy rate (hereinafter referred to as “occupancy factor”), and a weighting factor Kid based on color (hereinafter referred to as “color coefficient”). That is, Dv = ΣB × Kib × Kic × Kic. The time coefficient Kib is 0 when the display time is less than a predetermined first reference value (for example, 1 sec), 1 when the display time is not less than the first reference value and less than the second reference value (for example, 3 sec), and the second reference value. In the above case, 1.5 is set. The occupation coefficient Kic is 0 when the screen occupation ratio is less than a predetermined first reference value (for example, 2%), and 1 when the screen occupation ratio is greater than or equal to the first reference value and less than the second reference value (for example, 5%). In the case of the second reference value or more, it is set to 1.5 or the like. The color coefficient Kid is set to 1 for black (RGB = 0, 0, 0), 1.5 for other colors, and the like. By using such a coefficient, when the character string of the keyword is displayed only for a moment or when it is displayed very small, it is processed that the keyword has not occurred. On the other hand, when the keyword is displayed for a long time, when it is displayed in a large size, or when it is displayed in color characters, the value of the keyword generation amount Dv increases, and it becomes easier to add an index.

レベル判定部２２は、算出された二種類のキーワード発生量Ｄａ，Ｄｖそれぞれに基づいて各シーンのレベルＬａ，Ｌｖを判定する。レベルＬａ，Ｌｖは、キーワード発生量Ｄａ，Ｄｖの低い方から順に、レベル０、レベル１、レベル２・・・と複数段階、設定されている。レベル判定部２２は、レベル判定結果をインデックス生成部２４に出力する。 The level determination unit 22 determines the levels La and Lv of each scene based on the two types of calculated keyword generation amounts Da and Dv, respectively. Levels La and Lv are set in a plurality of stages, such as level 0, level 1, level 2... In order from the lowest keyword generation amount Da and Dv. The level determination unit 22 outputs the level determination result to the index generation unit 24.

インデックス生成部２４は、二種類のレベルＬａ，Ｌｖを加算し、その値が所定の基準値以上か否かを判断する。そして、基準値以上、換言すれば、キーワード発生量Ｄａ，Ｄｖが一定以上の場合は、当該シーンのインデックス情報を付与する。このときの基準値は、一定であってもよいし、ユーザが適宜、設定できるようにしてもよい。あるいは、記録対象の映像の種類（例えば、音楽番組、ニュース、スポーツ中継など）に応じて変えるようにしてもよい。 The index generation unit 24 adds the two types of levels La and Lv, and determines whether the value is equal to or greater than a predetermined reference value. When the keyword generation amounts Da and Dv are equal to or greater than a predetermined value, in other words, the index information of the scene is given. The reference value at this time may be constant or may be set as appropriate by the user. Or you may make it change according to the kind (for example, a music program, news, a sports broadcast, etc.) of the image | video to be recorded.

このように、レベル（キーワード発生量）が所定値以上の場合にのみ、インデックスを付与することにより、より確実にキーワードと関係があるシーンにのみインデックスを付与できる。すなわち、本来、キーワードと関係ないシーンであるにも関わらず、映像の登場人物が当該キーワードを呟いたり、当該キーワードに係るテロップを誤って表示したりすることがある。また、音声・画像認識ミスにより、登録されたキーワードとは異なる語をキーワードとして検出する場合もある。そのような場合にも、インデックスを付与した場合、インデックス付与されたシーンが多くなり、ユーザは、どのシーンが本当にキーワードと関連深いシーンかが分からなくなる。しかし、本実施形態のように、レベルが所定値以上、すなわち、本当にキーワードが多発した場合にのみインデックスを付与することにより、確実にキーワードと関係があるシーンにのみインデックスを付与できる。その結果、容易に、所望のシーンの検索ができる。 In this way, by assigning an index only when the level (keyword generation amount) is equal to or greater than a predetermined value, it is possible to assign an index only to a scene that is more reliably related to the keyword. That is, even though the scene is originally not related to the keyword, a video character may ask for the keyword or erroneously display a telop related to the keyword. In addition, due to a voice / image recognition error, a word different from the registered keyword may be detected as a keyword. Even in such a case, when an index is assigned, the number of scenes to which the index is assigned increases, and the user cannot know which scene is really related to the keyword. However, as in the present embodiment, by assigning an index only when the level is equal to or higher than a predetermined value, that is, when keywords are frequently generated, it is possible to reliably assign an index only to a scene related to the keyword. As a result, a desired scene can be easily searched.

なお、ここで説明したキーワード発生量Ｄａ，Ｄｖの算出方法は、一例であり、当然、他の手法で算出してもよい。したがって、キーワード発生量の算出の際、キーワードの発生回数の他に、音量や画面占有率などを用いる必要は無く、キーワード発生回数だけでキーワード発生量を算出してもよい。また、種々の重み係数を用いる場合であっても、その値の設定値は、上記以外の種々の形態を用いることができる。さらに、上記以外の重み係数を用いて算出してもよい。 Note that the method for calculating the keyword generation amounts Da and Dv described here is merely an example, and naturally, it may be calculated by another method. Therefore, when calculating the keyword generation amount, it is not necessary to use the volume, screen occupancy, etc. in addition to the number of keyword generations, and the keyword generation amount may be calculated based only on the number of keyword generations. Further, even when various weighting factors are used, various forms other than the above can be used as the set value. Further, it may be calculated using a weighting factor other than the above.

次に、このインデックス情報を再生した際に表示されるインデックス表示画面について図４〜図６を用いて説明する。図４〜図６は、インデックス表示画面６０の一例を示す図である。図４は、インデックス付与状況をタイムバーで表示した場合のインデックス表示画面６０の図である。この例では、各キーワードごとに帯状のタイムバー６２が表示される。タイムバー６２の長さは、映像の時間を示している。そして、タイムバー６２のうち、インデックス付与の時間に相当する部分には、他と異なる色のマーク６４が付される。したがって、図４の例では、時間ｔ１，ｔ４の際にキーワードＡが、時間ｔ１，ｔ２，ｔ３の際にキーワードＢが発生し、インデックスが付与されていることになる。また、インデックス付与箇所に付されるマーク６４の色は、インデックスのレベル（キーワード発生量）に応じて設定される。図４の例では、マーク６４の色は、レベルが高いほど濃く、レベルが低いほど薄くなるようになっている。つまり、より濃い色のマーク６４が付されている箇所ほど、キーワード発生量が多く、設定したキーワードと関連が深いシーンであることになる。 Next, an index display screen displayed when the index information is reproduced will be described with reference to FIGS. 4 to 6 are diagrams illustrating an example of the index display screen 60. FIG. FIG. 4 is a diagram of the index display screen 60 when the indexing status is displayed with a time bar. In this example, a band-shaped time bar 62 is displayed for each keyword. The length of the time bar 62 indicates the video time. In the time bar 62, a mark 64 having a different color from the others is attached to the portion corresponding to the indexing time. Therefore, in the example of FIG. 4, the keyword A is generated at the times t1 and t4, the keyword B is generated at the times t1, t2, and t3, and the index is given. In addition, the color of the mark 64 attached to the indexing location is set according to the index level (keyword generation amount). In the example of FIG. 4, the color of the mark 64 is darker as the level is higher, and lighter as the level is lower. That is, as the darker color mark 64 is added, the amount of generated keywords is larger, and the scene is deeply related to the set keyword.

このようにインデックスの付与状況をタイムバー６２で示すことにより、ユーザは、全映像時間のうち、どの辺りでキーワードが発生しているか、換言すれば、自己が興味のある内容がどの時間で発生しているか、を簡易に確認できる。また、各キーワードごとにタイムバー６２を表示することにより、どの時刻にどのキーワードが発生しているかが、簡易に確認できる。さらに、レベルに応じて色別でマーク６４を付与することにより、インデックス付与された各シーンとキーワードとの関連性を容易に推測することができる。その結果、ユーザは、所望のシーンを簡易に選ぶことができる。 In this way, by indicating the index assignment status with the time bar 62, the user can find out which part of the total video time the keyword has been generated, in other words, the content in which the user is interested. It is easy to check whether Also, by displaying the time bar 62 for each keyword, it is possible to easily confirm which keyword is generated at which time. Furthermore, by assigning the mark 64 by color according to the level, it is possible to easily estimate the relevance between each indexed scene and the keyword. As a result, the user can easily select a desired scene.

なお、このインデックス表示画面６０は、再生位置指示画面としての機能も備えている。すなわち、ユーザは、タイムバー６２を使って再生したい時間を指示すれば、その選択された時間から映像が再生される。例えば、ユーザが時間ｔ１のマーク６４を指示すれば、時間ｔ１から映像が再生されるようになっている。このようにインデックス表示画面６０を再生位置指示画面として用いることにより、ユーザは所望のシーンを簡易に見ることができる。 The index display screen 60 also has a function as a playback position instruction screen. That is, if the user indicates the time to be reproduced using the time bar 62, the video is reproduced from the selected time. For example, if the user indicates the mark 64 at time t1, the video is reproduced from time t1. Thus, by using the index display screen 60 as a playback position instruction screen, the user can easily view a desired scene.

図５は、他のインデックス表示画面６０の図である。このインデックス表示画面６０は、横軸が時間、縦軸がレベルの棒グラフ６６でインデックス付与状況を示している。また、棒グラフ６６のデータ（棒６８）は、キーワードごとに異なる色で表示されている。すなわち、図５の例では、時間ｔ１，ｔ２，ｔ３，ｔ４の際にキーワードＡが、時間ｔ１，ｔ２の際にキーワードＢが発生していることが分かる。キーワードのレベルは、棒６８の高さで示されている。したがって、図５の例では、時間ｔ３の際にキーワードＡが、時間ｔ２の際にはキーワードＢが多く発生していることが分かる。このような棒グラフ６６でインデックス付与状況を示すことにより、各シーンでのレベル（キーワード発生量）がより簡易に確認できる。なお、このインデックス表示画面も、再生位置指示画面として用いることができ、所望の再生開始時間を棒グラフ６６上で指示すれば、その指示された時間から映像が再生される。 FIG. 5 is a diagram of another index display screen 60. The index display screen 60 shows an index assignment status with a bar graph 66 in which the horizontal axis represents time and the vertical axis represents level. The data of the bar graph 66 (bar 68) is displayed in a different color for each keyword. That is, in the example of FIG. 5, it can be seen that the keyword A occurs at times t1, t2, t3, and t4, and the keyword B occurs at times t1 and t2. The keyword level is indicated by the height of the bar 68. Therefore, in the example of FIG. 5, it can be seen that many keywords A are generated at time t3 and many keywords B are generated at time t2. By indicating the indexing status with such a bar graph 66, the level (keyword generation amount) in each scene can be confirmed more easily. Note that this index display screen can also be used as a playback position instruction screen. When a desired playback start time is indicated on the bar graph 66, a video is played from the specified time.

図６は、他のインデックス表示画面６０の図である。このインデックス表示画面６０でも、インデックス付与状況は、横軸が時間、縦軸がレベルを示す棒グラフ７０で示される。ただし、このインデックス表示画面６０では、各キーワードの発生態様がより詳細に分かるように、発生態様によって棒７２の色を変えている。例えば、キーワードが音声で発生した場合は赤色、文字で発生した場合は青色の棒でレベルを示す。さらに、キーワードが特殊な態様で発生した場合、例えば、基準値以上の音量で発生した場合や、キーワードを示す文字の画面占有率が基準値以上の場合は、その特殊対応でのキーワード発生に対応するレベルを更に別の色で表示する。図６の例では、時間ｔ１の際に、音声によってキーワードが多く発生し、文字によってキーワードが若干発生していることが分かる。また、時間ｔ３のときには、特殊な態様でキーワードが発生していることが分かる。このようなインデックス表示画面とすることで、各シーンでのキーワードの発生状況がより詳細にわかる。なお、このインデックス画面は、キーワードごとに設けてもよいし、全キーワードのレベルを加算した値を棒グラフ化するようにしてもよい。 FIG. 6 is a diagram of another index display screen 60. Also on this index display screen 60, the indexing status is indicated by a bar graph 70 in which the horizontal axis indicates time and the vertical axis indicates level. However, on the index display screen 60, the color of the bar 72 is changed depending on the generation mode so that the generation mode of each keyword can be understood in more detail. For example, when a keyword is generated by voice, the level is indicated by a red bar, and when the keyword is generated by a letter, the level is indicated by a blue bar. Furthermore, when a keyword occurs in a special manner, for example, when it occurs at a volume higher than the reference value, or when the screen occupancy rate of characters indicating the keyword is higher than the reference value, it corresponds to the keyword generation in that special correspondence The level to be displayed is displayed in a different color. In the example of FIG. 6, it can be seen that at time t1, many keywords are generated by voice and some keywords are generated by characters. Also, at time t3, it can be seen that keywords are generated in a special manner. By using such an index display screen, the occurrence status of keywords in each scene can be understood in more detail. This index screen may be provided for each keyword, or a value obtained by adding the levels of all keywords may be displayed as a bar graph.

次に、この映像記録再生装置１０を用いて、録画（映像信号の記録）をする場合の流れについて図７、図８を用いて説明する。ユーザは、録画開始の前に、ユーザインターフェース２９を介して、キーワードを設定する（Ｓ１２）。設定されたキーワードは、キーワード記憶部２６に記憶、保持される。次に、ユーザの指示に応じて、録画を開始する（Ｓ１４）。外部機器３４から入力された映像信号は、必要に応じて、Ａ／Ｄ変換や、ＭＰＥＧエンコード処理などが施された後、記録部１２およびキーワード発生量検出部１６に入力される。記録部１２は、入力された映像信号を記録媒体１４に順次、記録していく（Ｓ１６）。なお、外部機器３４から入力された映像信号は、Ａ／Ｄ変換、ノイズリダクション処理を行なった後、すぐに、キーワード発生量検出部１６に入力される構成にすることが望ましい。かかる構成とすることで、映像信号のデータ量が多く、また、圧縮されていないのでＳ／Ｎ比が良い。さらに、キーワード発生量検出部１６と、ＭＰＥＧエンコード処理部（図示せず）に同じ映像信号を送るようにすれば、映像信号の劣化も低減できる。 Next, a flow when recording (recording a video signal) using the video recording / reproducing apparatus 10 will be described with reference to FIGS. The user sets a keyword via the user interface 29 before starting recording (S12). The set keyword is stored and held in the keyword storage unit 26. Next, recording is started in accordance with a user instruction (S14). The video signal input from the external device 34 is input to the recording unit 12 and the keyword generation amount detection unit 16 after being subjected to A / D conversion, MPEG encoding processing, and the like as necessary. The recording unit 12 sequentially records the input video signal on the recording medium 14 (S16). It is desirable that the video signal input from the external device 34 is input to the keyword generation amount detection unit 16 immediately after A / D conversion and noise reduction processing. With such a configuration, the data amount of the video signal is large, and the S / N ratio is good because it is not compressed. Furthermore, if the same video signal is sent to the keyword generation amount detection unit 16 and the MPEG encoding processing unit (not shown), the degradation of the video signal can be reduced.

また、映像の記録と並行して、インデックスの生成・記録も行われる（Ｓ１８）。この流れを図８に示す。キーワード発生量検出部１６の音声認識部１８および画像認識部２０は、入力された音声信号および映像信号を、単位時間ごとのシーンに分割し、各シーンにキーワードが発生しているか、否かを判断する（Ｓ２２）。これは、既述した音声認識技術、画像認識技術を用いて行われる。その結果、キーワードが発生していない場合は、次のシーンのインデックス生成・記録を行う（Ｓ３２，Ｓ３４）。キーワードが発生している場合は、レベル判定部２２に音声・画像認識結果を出力する。レベル判定部２２は、音声・画像認識結果に基づいて、キーワード発生量を算出し、当該シーンのレベルを判定する（Ｓ２４）。レベル判定結果は、音声・画像認識結果とともにインデックス生成部２４へと出力される。 In parallel with the video recording, an index is generated and recorded (S18). This flow is shown in FIG. The voice recognition unit 18 and the image recognition unit 20 of the keyword generation amount detection unit 16 divide the input audio signal and video signal into scenes for each unit time, and determine whether or not a keyword is generated in each scene. Judgment is made (S22). This is performed using the voice recognition technology and image recognition technology described above. As a result, if no keyword is generated, the next scene is indexed and recorded (S32, S34). When a keyword is generated, the voice / image recognition result is output to the level determination unit 22. The level determination unit 22 calculates a keyword generation amount based on the sound / image recognition result, and determines the level of the scene (S24). The level determination result is output to the index generation unit 24 together with the sound / image recognition result.

インデックス生成部２４は、当該シーンのレベルが所定の基準値以上か否かを判定する（Ｓ２６）。レベルが所定の基準値未満の場合は、当該シーンにはインデックスを付与せず、次シーンのインデックス生成・記録へと移る（Ｓ３２，Ｓ３４）。レベルが所定の基準値以上の場合は、音声・画像認識結果、レベル判定結果に基づいて、インデックス情報を作成する（Ｓ２８）。作成されたインデックス情報は、記録部１２に渡される。記録部１２は、当該シーンに対応する映像信号と関連づけて、インデックス情報を記録媒体１４に記録する（Ｓ３０）。この流れを各シーンごとに繰り返し、次シーンが無くなれば、インデックスの生成・記録も終了となる。 The index generation unit 24 determines whether or not the scene level is equal to or higher than a predetermined reference value (S26). If the level is less than the predetermined reference value, no index is assigned to the scene, and the process proceeds to index generation / recording of the next scene (S32, S34). If the level is equal to or higher than the predetermined reference value, index information is created based on the sound / image recognition result and the level determination result (S28). The created index information is passed to the recording unit 12. The recording unit 12 records the index information on the recording medium 14 in association with the video signal corresponding to the scene (S30). This flow is repeated for each scene, and when there is no next scene, index generation / recording ends.

次に、記録媒体に記録された映像信号に対して、新たに、インデックスを付与する場合の流れについて図９を用いて説明する。記録済み映像に対してインデックスを付与する場合、ユーザは、ユーザインターフェース２９を介して、キーワードを設定する（Ｓ３６）。設定されたキーワードは、キーワード記憶部２６に記憶、保持される。続いて、ユーザは、キーワードのサーチ開始を指示する（Ｓ３８）。この指示を受けた制御部２８は、再生部３０に対して記録された映像の再生を指示する。再生部３０は、指示に応じて記録媒体１４に記録された映像を再生する（Ｓ４０）。ただし、このときの再生は、映像鑑賞のための再生ではなく、キーワードサーチのための再生であるため、通常と異なるレートで再生する等してもよい。また、動画像や音声は出力せずに、キーワード発生量検出部１６内のＤＳＰの処理能力の限界速度で映像信号を再生してもよい。再生された映像信号は、キーワード発生量検出部１６に出力される。そして、キーワードの発生の有無や、レベル判定などされた後、インデックス生成部２４によってインデックス情報が作成される。この流れは、図８と同様であるため、ここでは説明を省略する。作成されたインデックス情報は、記録部１２により、記録媒体１４に追加記録される。そして、全シーンのインデックス情報が記録できれば、処理は終了となる（Ｓ４４）。 Next, a flow when a new index is added to the video signal recorded on the recording medium will be described with reference to FIG. When assigning an index to a recorded video, the user sets a keyword via the user interface 29 (S36). The set keyword is stored and held in the keyword storage unit 26. Subsequently, the user instructs a keyword search start (S38). Receiving this instruction, the control unit 28 instructs the reproduction unit 30 to reproduce the recorded video. The reproducing unit 30 reproduces the video recorded on the recording medium 14 according to the instruction (S40). However, the playback at this time is not playback for video viewing but playback for keyword search, so playback may be performed at a rate different from normal. Further, the video signal may be reproduced at the limit speed of the processing capability of the DSP in the keyword generation amount detection unit 16 without outputting a moving image or sound. The reproduced video signal is output to the keyword generation amount detection unit 16. Then, after the presence / absence of a keyword or the level is determined, the index generation unit 24 creates index information. Since this flow is the same as in FIG. 8, the description thereof is omitted here. The created index information is additionally recorded on the recording medium 14 by the recording unit 12. If index information for all scenes can be recorded, the process ends (S44).

次に、インデックス表示画面の表示の流れについて図１０を用いて説明する。ユーザは、ユーザインターフェース２９を介してインデックス表示画面の表示を指示する（Ｓ４６）。この指示を受けた制御部２８は、再生部３０に対して、インデックス情報の再生を指示する。再生部３０は、インデックス情報のみを再生し、インデックス画面生成部３２に渡す（Ｓ４８）。インデックス画面生成部３２は、インデックス情報に含まれた各種情報、キーワード発生位置や、レベル、発生態様などに基づいてインデックス表示画面を作成する（Ｓ５０）。ここで作成される表示画面は、タイムバー形式、棒グラフ形式などである。作成されたインデックス表示画面は、モニタなどの表示器３６に表示される。ユーザは、表示されたインデックス表示画面を見ながら、所望の再生シーンを指示する（Ｓ５２）。この指示は、タイムバーや棒グラフ上の再生したい位置を指示することによりなされる。制御部２８は、指示された位置に対応するシーンからの映像再生を再生部３０に指示する。再生部３０は、指示されたシーンから、映像を再生し、モニタに出力する（Ｓ５４）。 Next, the display flow of the index display screen will be described with reference to FIG. The user instructs display of the index display screen via the user interface 29 (S46). Receiving this instruction, the control unit 28 instructs the reproduction unit 30 to reproduce the index information. The reproducing unit 30 reproduces only the index information and passes it to the index screen generating unit 32 (S48). The index screen generation unit 32 creates an index display screen based on various information included in the index information, keyword generation position, level, generation mode, and the like (S50). The display screen created here is a time bar format, a bar graph format, or the like. The created index display screen is displayed on a display 36 such as a monitor. The user instructs a desired reproduction scene while viewing the displayed index display screen (S52). This instruction is made by specifying a position to be reproduced on the time bar or bar graph. The control unit 28 instructs the playback unit 30 to play back video from the scene corresponding to the instructed position. The playback unit 30 plays back the video from the instructed scene and outputs it to the monitor (S54).

以上、説明したように、本実施形態によれば、キーワード発生量（レベル）が一定以上の場合にのみインデックスを付与するため、キーワード発生量が高い、換言すれば、キーワードとの関連性が深いシーンにのみインデックスを付与できる。また、インデックス付与状況を視覚的に表示することにより、ユーザは、どのシーンがキーワードと関連深いかが、容易に認識できる。さらに、その付与状況を見て、再生開始位置を指示できるため、所望のシーンを簡易に鑑賞することができる。その結果、所望のシーンを簡易に検索することができる。 As described above, according to the present embodiment, since the index is assigned only when the keyword generation amount (level) is a certain level or more, the keyword generation amount is high, in other words, the relevance to the keyword is deep. Only scenes can be indexed. Also, by visually displaying the indexing status, the user can easily recognize which scene is closely related to the keyword. Furthermore, since the reproduction start position can be instructed by looking at the application status, a desired scene can be easily appreciated. As a result, a desired scene can be easily searched.

なお、本実施形態では、キーワード発生量に基づいてインデックス付与の可否を判断しているが、キーワード発生量に基づいて映像の記録の可否を判断するようにしてもよい。すなわち、外部機器から入力された映像信号を、バッファメモリなどの記憶手段に一時記憶しておき、各シーンのキーワード発生量を算出する。そして、キーワード発生量が所定値以上のシーンがあれば、当該シーン、および、その前後、所定時間分のシーンのみを、記録媒体に記録する。このようにすることにより、所望のシーンのみが記録媒体に記録されることになるため、ユーザは、記録後、シーンの検索等をしなくても所望のシーンを鑑賞できる。また、記録媒体の記録容量を節約できる。この場合、キーワードが発生したシーンだけでなく、その前後、ある程度の時間分のシーンも記録することが望ましい。その時間は、固定値であってもよいし、ユーザが適宜設定できるようにしてもよい。また、映像種類（例えば、ニュース、スポーツ中継、歌番組など）によって変更できるようにしてもよい。 In this embodiment, whether or not indexing is possible is determined based on the amount of generated keywords. However, whether or not video can be recorded may be determined based on the amount of generated keywords. That is, a video signal input from an external device is temporarily stored in a storage unit such as a buffer memory, and a keyword generation amount for each scene is calculated. If there is a scene in which the keyword generation amount is equal to or greater than a predetermined value, only the scene and scenes for a predetermined time before and after the scene are recorded on the recording medium. In this way, only the desired scene is recorded on the recording medium, so that the user can appreciate the desired scene without searching for the scene after recording. Further, the recording capacity of the recording medium can be saved. In this case, it is desirable to record not only the scene in which the keyword occurred but also scenes for a certain amount of time before and after that. The time may be a fixed value or may be set appropriately by the user. Further, it may be changed depending on the video type (for example, news, sports broadcast, song program, etc.).

また、既に記録媒体に記録された映像については、キーワード発生量に基づいて、削除シーンを判断してもよい。すなわち、記録された映像の各シーンのキーワード発生量を算出し、キーワード発生量が所定値未満のシーンについては、削除する。ただし、この場合であっても、キーワード発生量が所定値以上のシーンの前後、数分間分のシーンは削除せずに残しておくことが望ましい。 In addition, for a video that has already been recorded on a recording medium, a deletion scene may be determined based on a keyword generation amount. That is, the keyword generation amount of each scene of the recorded video is calculated, and scenes whose keyword generation amount is less than a predetermined value are deleted. However, even in this case, it is desirable to leave a few minutes of scenes before and after a scene with a keyword generation amount equal to or greater than a predetermined value.

このように、キーワード発生量に基づいて映像記録や、映像シーン削除を行うことにより、映像の編集等の手間を大幅に削減できる。そして、結果として、簡易に所望のシーンを検索できる。 Thus, by performing video recording and video scene deletion based on the amount of generated keywords, it is possible to greatly reduce the trouble of editing video. As a result, a desired scene can be easily searched.

本発明の実施形態である映像記録再生装置の概略ブロック図である。1 is a schematic block diagram of a video recording / reproducing apparatus according to an embodiment of the present invention. パターンマッチングによる文字認識の例を示す図である。It is a figure which shows the example of the character recognition by pattern matching. 画面占有率を説明する図である。It is a figure explaining a screen occupation rate. インデックス表示画面の一例を示す図である。It is a figure which shows an example of an index display screen. インデックス表示画面の一例を示す図である。It is a figure which shows an example of an index display screen. インデックス表示画面の一例を示す図である。It is a figure which shows an example of an index display screen. 映像の記録の流れを示すフローチャートである。It is a flowchart which shows the flow of recording of an image | video. インデックスの生成・記録の流れを示すフローチャートである。It is a flowchart which shows the flow of an index production | generation and recording. 記録済み映像にインデックスを付与する流れを示すフローチャートである。It is a flowchart which shows the flow which provides an index to the recorded image | video. インデックス表示画面の表示の流れを示すフローチャートである。It is a flowchart which shows the flow of a display of an index display screen.

Explanation of symbols

１０映像記録再生装置、１２記録部、１４記録媒体、１６キーワード発生量検出部、１８音声認識部、２０画像認識部、２２レベル判定部、２４インデックス生成部、２６キーワード記憶部、２８制御部、２９ユーザインターフェース、３０再生部、３２インデックス画面生成部、３４外部機器、３６表示器、６０インデックス表示画面。 DESCRIPTION OF SYMBOLS 10 Video recording / reproducing apparatus, 12 Recording part, 14 Recording medium, 16 Keyword generation amount detection part, 18 Voice recognition part, 20 Image recognition part, 22 Level determination part, 24 Index generation part, 26 Keyword storage part, 28 Control part, 29 user interface, 30 playback unit, 32 index screen generation unit, 34 external device, 36 display, 60 index display screen.

Claims

A video recording / reproducing apparatus for recording video on a recording medium,
A storage unit for storing keywords designated by the user in advance;
A detecting means for analyzing the input video and detecting a keyword generation amount per predetermined unit time;
A determination means for determining whether or not a keyword generation amount is equal to or greater than a reference value every predetermined unit time;
Index recording means for recording index information indicating the occurrence of a keyword on a recording medium together with position information where the keyword is generated when the amount of generated keyword is equal to or greater than a reference value;
A video recording / reproducing apparatus comprising:

The video recording / reproducing apparatus according to claim 1,
The image recording / reproducing apparatus, wherein the detecting means calculates the generation amount based on the number of occurrences of the keyword within a predetermined time.

The video recording / reproducing apparatus according to claim 2,
If a keyword is spoken,
An image recording / reproducing apparatus, wherein the detecting means calculates the amount of occurrence based on the number of occurrences of the keyword and the volume of the keyword at each occurrence.

The video recording / reproducing apparatus according to claim 2 or 3,
If the keyword occurs in letters,
The image recording / reproducing apparatus, wherein the detecting means calculates the generation amount based on at least one of the color of the generated keyword, the display time, and the screen occupation ratio, and the number of times the keyword is generated.

The video recording / reproducing apparatus according to any one of claims 1 to 4,
The video recording / reproducing apparatus characterized in that the index recording means also records a keyword generation amount as index information.

6. The video recording / reproducing apparatus according to claim 5, further comprising:
An image recording / reproducing apparatus, comprising: a display unit that presents an index display screen showing a keyword occurrence location and an amount generated based on index information recorded on a recording medium to a user.

The video recording / reproducing apparatus according to claim 6,
A video recording / reproducing apparatus characterized by being able to indicate a reproduction start position of a video recorded on a recording medium on an index display screen.

A video recording / reproducing apparatus for recording video on a recording medium,
A storage unit for storing keywords designated by the user in advance;
A detecting means for analyzing the input video and detecting a keyword generation amount per predetermined unit time;
Index recording means for recording index information indicating a keyword generation position and generation amount on a recording medium in association with the keyword generation position;
Based on the index information recorded on the recording medium, a presentation means for presenting to the user an index display screen showing the occurrence location of the keyword and its generation amount;
A video recording / reproducing apparatus comprising:

A video recording / reproducing apparatus for recording video composed of a plurality of scenes on a recording medium,
A storage unit for storing keywords designated by the user in advance;
A detecting means for analyzing the input video and detecting a keyword generation amount per predetermined unit time;
A determination means for determining whether or not a keyword generation amount is equal to or greater than a reference value every predetermined unit time;
Video recording means for recording the video related to the occurrence scene of the keyword on a recording medium when the reference value is greater than or equal to the reference value;
A video recording / reproducing apparatus comprising:

A video recording / playback apparatus for playing back video recorded on a recording medium,
A storage unit for storing keywords designated by the user in advance;
A detecting means for analyzing a video recorded on the recording medium and detecting a keyword generation amount per predetermined unit time;
A determination means for determining whether or not a keyword generation amount is equal to or greater than a reference value every predetermined unit time;
Video deletion means for deleting videos whose keyword generation amount is less than the reference value from the recording medium;
A video recording / reproducing apparatus comprising: