JP2012065013A

JP2012065013A - Moving image file generation method

Info

Publication number: JP2012065013A
Application number: JP2010205532A
Authority: JP
Inventors: Miki Ito; 幹伊藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-09-14
Filing date: 2010-09-14
Publication date: 2012-03-29

Abstract

PROBLEM TO BE SOLVED: To solve the conventional problem in which a large load is imposed on an operator when generating a moving image file of scenes in which persons smile based on input video signals.SOLUTION: A smile determination part 105 of a processor 100 determines a smile degree on the basis of the feature amounts of persons detected from input video signals (S304). A file generation part 110 of the processor 100 generates a moving image file including the video signals of a period with a smile degree higher than a threshold among the input video signals (S312).

Description

本発明は、笑顔度に応じた期間の動画ファイルを生成する方法に関する。 The present invention relates to a method of generating a moving image file having a period corresponding to a smile level.

近年、被写体の画像から特定のパターンを認識したことを利用した様々な技術が注目されている。中でも、人物の画像から笑顔であることを認識して、撮影するタイミングを決定するオートシャッタ機能を持ったデジタルカメラなどが広がりつつある。 In recent years, various techniques using the recognition of a specific pattern from a subject image have attracted attention. Among them, digital cameras having an auto shutter function for recognizing a smile from a person image and determining a shooting timing are spreading.

特許文献１では、笑顔ポイントを判定し、その結果を用いて画像を合成することで、ユーザーの満足度の高い画像を生成することが開示されている。 Patent Document 1 discloses generating an image with high user satisfaction by determining a smile point and synthesizing an image using the result.

特開２００８−１９８０６２号公報JP 2008-198062 A

しかしながら、入力された映像信号から人物が笑顔のシーンの動画ファイルを生成しようとした場合、作業者に大きな負荷がかかることがあった。
例えば、作業者が、長時間の入力映像信号から人物が笑顔になっているシーンを切り取って動画ファイルを生成しようとすると、作業者に大きな負荷がかかってしまうことがあった。 However, when trying to generate a moving image file of a scene where a person is smiling from an input video signal, a heavy load may be imposed on the operator.
For example, when an operator tries to generate a moving image file by cutting a scene in which a person is smiling from an input video signal for a long time, a heavy load may be applied to the operator.

本発明は、上記の問題点に鑑みてなされたものであり、その目的は、入力された映像信号から人物が笑顔のシーンの動画ファイルを生成する際の作業者の負荷を低減しつつ、満足度が高い動画ファイルを生成することである。 The present invention has been made in view of the above-mentioned problems, and its object is to satisfy the operator while reducing the burden on the operator when generating a moving image file of a scene where a person is smiling from an input video signal. It is to generate a high-quality video file.

上記の目的を達成するために、本発明の処理装置は例えば、以下の構成を有する。すなわち、映像信号を入力する入力手段と、前記入力された映像信号から検出された人物の特徴量から笑顔度を判定する判定手段と、前記判定された笑顔度と閾値とを比較する比較手段と、前記入力された映像信号のうち前記判定された笑顔度が前記閾値よりも高い期間を含む期間の映像信号から動画ファイルを生成する生成手段とを有する。 In order to achieve the above object, the processing apparatus of the present invention has the following configuration, for example. That is, an input unit that inputs a video signal, a determination unit that determines a smile level from a feature amount of a person detected from the input video signal, and a comparison unit that compares the determined smile level and a threshold value Generating means for generating a moving image file from a video signal in a period including a period in which the determined smile level is higher than the threshold value in the input video signal.

本発明によれば、入力された映像信号から人物が笑顔のシーンの動画ファイルを生成する際の作業者の負荷を低減しつつ、満足度が高い動画ファイルを生成できるようになる。 According to the present invention, it is possible to generate a moving image file with a high degree of satisfaction while reducing the burden on the operator when generating a moving image file of a scene where a person is smiling from an input video signal.

画像処理装置のブロック構成図である。It is a block block diagram of an image processing apparatus. 笑顔度と映像信号の切り出し部分の関係を説明するための図である。It is a figure for demonstrating the relationship between a smile degree and the extraction part of a video signal. 画像処理装置の処理を説明するためのフローチャート図である。It is a flowchart figure for demonstrating the process of an image processing apparatus. 撮像画像の画面表示例を示した図である。It is the figure which showed the example of a screen display of a captured image. 画像処理装置のネットワーク接続構成を示した図である。It is the figure which showed the network connection structure of the image processing apparatus.

以下、添付の図面を参照して、本発明をその好適な実施形態に基づいて詳細に説明する。なお、以下の実施形態において示す構成は一例に過ぎず、本発明は図示された構成に限定されるものではない。 Hereinafter, the present invention will be described in detail based on preferred embodiments with reference to the accompanying drawings. The configurations shown in the following embodiments are merely examples, and the present invention is not limited to the illustrated configurations.

＜第１の実施形態＞
図１は、本実施形態の画像処理装置１００（以下処理装置１００）を示すブロック構成図である。図１において、撮像部１０１はＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）などの撮像素子によって撮像された画像信号に信号処理などを施して得られたＹＵＶのフォーマットの映像信号を色変換部１０２へ出力する。本形態の映像信号は、デジタル映像信号である。また本形態の撮像部１０１は、ユーザからの指示に応じて、時間的に連続した映像信号を入力する。撮像部１０１は、処理装置１００とは別構成としても良い。この場合、撮像装置で得られた映像信号はＨＤＭＩ（Ｈｉｇｈ−ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）などの標準インターフェースによって処理装置１００に入力される。
色変換部１０２は、撮像部１０１からのＹＵＶの映像信号をＲＧＢの映像信号に変換する。顔検出部１０３は、色変換部１０２からのＲＧＢの映像信号に対して顔の有無及び顔の位置の検出処理を行なう。 <First Embodiment>
FIG. 1 is a block diagram showing an image processing apparatus 100 (hereinafter referred to as a processing apparatus 100) according to this embodiment. In FIG. 1, an imaging unit 101 outputs a YUV format video signal obtained by subjecting an image signal captured by an imaging device such as a CMOS (Complementary Metal Oxide Semiconductor) to a color conversion unit 102. The video signal in this embodiment is a digital video signal. Further, the imaging unit 101 according to the present embodiment inputs a temporally continuous video signal in response to an instruction from the user. The imaging unit 101 may be configured differently from the processing device 100. In this case, the video signal obtained by the imaging apparatus is input to the processing apparatus 100 through a standard interface such as HDMI (High-Definition Multimedia Interface).
The color conversion unit 102 converts the YUV video signal from the imaging unit 101 into an RGB video signal. The face detection unit 103 performs face presence / absence detection and face position detection processing on the RGB video signal from the color conversion unit 102.

本形態の顔検出部１０３は、あらかじめ記憶した大量の顔画像（正解画像）と非顔画像（不正解画像）とを用いた学習に基づいて、映像信号から顔画像を検出する。より具体的には、顔検出部１０３は、顔画像（正解画像）と非顔画像（不正解画像）とを用いた学習に基づいて、映像信号から目や口や顔の位置を検出する。ただし、顔画像の検出方法はこの方法に限らない。 The face detection unit 103 according to the present embodiment detects a face image from a video signal based on learning using a large number of face images (correct images) and non-face images (incorrect images) stored in advance. More specifically, the face detection unit 103 detects the position of the eyes, mouth, and face from the video signal based on learning using a face image (correct image) and a non-face image (incorrect image). However, the face image detection method is not limited to this method.

顔追尾部１０４は、顔検出部１０３により検出された顔画像を、オプティカルフロー等により画面内追尾する。 The face tracking unit 104 tracks the face image detected by the face detection unit 103 in the screen using an optical flow or the like.

笑顔判定部１０５は、顔検出部１０３で検出され、顔追尾部１０４で追尾される顔画像の特徴量から笑顔度を判定する。本形態の笑顔判定部１０５は、顔画像から眉、眼、唇を抽出する。そして笑顔判定部１０５は、眉の外側の端が下がっているほど高い笑顔度であると判定する。また、笑顔判定部１０５は、眼の瞳孔が開いているほど高い笑顔度であると判定する。また、笑顔判定部１０５は、唇の両端が中央に対して上がっているほど高い笑顔度であると判定する。ただし笑顔度の判定方法は、上記の方法に限らない。
笑顔判定部１０５は、笑顔判定部１０５による笑顔度の判定結果に応じて、スイッチ部１０９へ制御信号を入力する。スイッチ部１０９は、笑顔判定部１０５からの制御信号に基づいて、撮像部１０１からのＹＵＶの映像信号をファイル生成部１１０に渡すか否かを切り替える。すなわち、スイッチ部１０９は、顔画像の笑顔度が閾値よりも高いと判定された期間の映像信号がファイル生成部１１０に渡されるように、笑顔判定部１０５からの入力に応じて、撮像部１０１からの映像信号をファイル生成部１１０に渡すか否かを切り替える。 The smile determination unit 105 determines the smile level from the feature amount of the face image detected by the face detection unit 103 and tracked by the face tracking unit 104. The smile determination unit 105 of this embodiment extracts eyebrows, eyes, and lips from the face image. The smile determination unit 105 determines that the smile level is higher as the outer edge of the eyebrows is lowered. The smile determination unit 105 determines that the smile level is higher as the pupil of the eye is opened. The smile determination unit 105 determines that the smile level is higher as both ends of the lips are raised with respect to the center. However, the smile degree determination method is not limited to the above method.
The smile determination unit 105 inputs a control signal to the switch unit 109 according to the determination result of the smile level by the smile determination unit 105. Based on the control signal from the smile determination unit 105, the switch unit 109 switches whether to pass the YUV video signal from the imaging unit 101 to the file generation unit 110. That is, the switch unit 109 responds to an input from the smile determination unit 105 so that a video signal during a period in which the smile degree of the face image is determined to be higher than the threshold is passed to the file generation unit 110. Whether to transfer the video signal from the file generation unit 110 is switched.

フレーム特定部１０６は、動画ファイルの期間のフレームのうち、笑顔判定部１０５で最高の笑顔度と判定された顔画像が含まれるフレームを特定する。すなわち、フレーム特定部１０６は、笑顔判定部１０５から笑顔度と、当該笑顔度が判定されたフレームの特定情報とを受信する。そして、フレーム特定部１０６は、笑顔判定部１０５から受信された笑顔度が、動画ファイルの期間が開始してから受信された他の笑顔度よりも高い場合、当該笑顔度が判定されたフレームの特定情報をフレーム切り出し部１０７に出力する。例えば、フレーム特定部１０６は、動画ファイルの期間の開始直後は、１番目のフレームの特定情報をフレーム切り出し部１０７に出力する。そして、２番目のフレームの笑顔度が、１番目のフレームの笑顔度より高いと判定された場合、フレーム特定部１０６は、２番目のフレームの特定情報を出力する。一方、２番目のフレームの笑顔度が、１番目のフレームの笑顔度よりも低いと判定された場合、フレーム特定部１０６は、２番目のフレームの特定情報をフレーム切り出し部１０７に出力しない。また、フレーム特定部１０６は、動画ファイルの期間が終了すると、そのことをフレーム切り出し部１０７に通知する。 The frame specifying unit 106 specifies a frame including the face image determined by the smile determination unit 105 as the highest smile degree among the frames of the moving image file period. That is, the frame specifying unit 106 receives the smile level and the specific information of the frame for which the smile level is determined from the smile determination unit 105. Then, when the smile level received from the smile determination unit 105 is higher than the other smile levels received after the start of the video file period, the frame specifying unit 106 selects the frame of which the smile level is determined. The specific information is output to the frame cutout unit 107. For example, the frame specifying unit 106 outputs the specific information of the first frame to the frame cutout unit 107 immediately after the start of the moving image file period. If it is determined that the smile level of the second frame is higher than the smile level of the first frame, the frame specifying unit 106 outputs the specifying information of the second frame. On the other hand, when it is determined that the smile level of the second frame is lower than the smile level of the first frame, the frame specifying unit 106 does not output the specific information of the second frame to the frame cutout unit 107. In addition, when the period of the moving image file ends, the frame specifying unit 106 notifies the frame cutout unit 107 of that.

フレーム切り出し部１０７は、フレーム特定部１０６で特定されたフレームを、撮像部１０１からのＹＵＶの映像信号から切り出して一時的に保持する。例えば、フレーム切り出し部１０７は、動画ファイルの期間の開始直後は、１番目のフレームの映像信号を一時的に保持する。また、例えば２番目のフレームの笑顔度が１番目のフレームの笑顔度よりも高い場合、１番目のフレームの映像信号の代わりに、２番目のフレームの映像信号を一時的に保持する。 The frame cutout unit 107 cuts out the frame specified by the frame specifying unit 106 from the YUV video signal from the imaging unit 101 and temporarily holds the frame. For example, the frame cutout unit 107 temporarily holds the video signal of the first frame immediately after the start of the moving image file period. For example, when the smile level of the second frame is higher than the smile level of the first frame, the video signal of the second frame is temporarily held instead of the video signal of the first frame.

このとき、処理装置１００は、色変換部１０２からフレーム特定部１０６で行う処理にかかる遅延時間を考慮したバッファを持つことで、フレーム切り出し部１０７は、フレーム特定部１０６により特定されたフレームを保持することができる。動画ファイルの期間が終了すると、フレーム切り出し部１０７は、保持しているフレームの映像信号をサムネイル生成部１０８に出力する。このときにフレーム切り出し部１０７が出力する映像信号は、動画ファイルの期間のフレームのうち、笑顔判定部１０５により判定された笑顔度が最も高いフレームの映像信号となる。 At this time, the processing apparatus 100 has a buffer that takes into account the delay time required for processing performed by the frame specifying unit 106 from the color conversion unit 102, so that the frame cutout unit 107 holds the frame specified by the frame specifying unit 106. can do. When the moving image file period ends, the frame cutout unit 107 outputs the video signal of the held frame to the thumbnail generation unit 108. The video signal output by the frame cutout unit 107 at this time is a video signal of a frame having the highest smile degree determined by the smile determination unit 105 among the frames in the moving image file period.

サムネイル生成部１０８は、フレーム切り出し部１０７により出力された映像信号から動画ファイルのサムネイル画像を生成する。このような構成により、サムネイル生成部１０８は、動画ファイルの期間の第１の動画フレーム（１番目のフレーム）の第１の笑顔度よりも高い第２の笑顔度の第２の動画フレームに基づいて、動画ファイルのサムネイルを生成する。 The thumbnail generation unit 108 generates a thumbnail image of a moving image file from the video signal output from the frame cutout unit 107. With such a configuration, the thumbnail generation unit 108 is based on the second moving image frame having the second smile degree higher than the first smiling degree of the first moving image frame (first frame) in the moving image file period. To generate a thumbnail of the video file.

ファイル生成部１１０は、撮像部１０１からの映像信号をスイッチ部１０９を介して受け取り、例えば、ＡＶＣＨＤ（ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｅｃＨｉｇｈＤｅｆｉｎｉｔｉｏｎ）などのフォーマットに基づいて動画ファイルを生成する。すなわち、ファイル生成部１１０は、入力された映像信号のうち、判定された笑顔度が閾値よりも高い期間を含む期間の映像信号から動画ファイルを生成する。 The file generation unit 110 receives the video signal from the imaging unit 101 via the switch unit 109 and generates a moving image file based on a format such as AVCHD (Advanced Video Code High Definition). That is, the file generation unit 110 generates a moving image file from a video signal in a period including a period in which the determined smile level is higher than a threshold among the input video signals.

次に本形態の処理装置１００の動作について、図３のフローチャートを用いて説明する。なお、本形態では、処理装置１００の各部の処理が専用のハードウェアにより行われる例について説明するが、各部の処理の少なくとも一部を、処理装置１００が有するＣＰＵが、ＲＯＭやＨＤＤからプログラムを読み出して実行することも可能である。本形態では、図３の処理を、笑顔ファイルの生成モードが選択されたことにより開始される。ただし、例えば通常の動画撮影モードなどで開始するようにしても良い。
図３のＳ３０１（入力手順）において、撮像部１０１は撮像を行い、得られた映像信号を入力する。なお、撮像部１０１を処理装置１００とは別の撮像装置とすることも可能である。この場合、Ｓ３０１では撮像装置からの映像信号が入力される。 Next, the operation of the processing apparatus 100 of this embodiment will be described using the flowchart of FIG. In this embodiment, an example in which processing of each unit of the processing device 100 is performed by dedicated hardware will be described. It is also possible to read and execute. In this embodiment, the process of FIG. 3 is started when the smile file generation mode is selected. However, it may be started in, for example, a normal moving image shooting mode.
In S301 (input procedure) of FIG. 3, the imaging unit 101 performs imaging and inputs the obtained video signal. Note that the imaging unit 101 may be an imaging device different from the processing device 100. In this case, a video signal from the imaging device is input in S301.

Ｓ３０２において、処理装置１００の制御部（不図示）は、動画撮影の終了が指示されたか否かを判定する。動画撮影の終了が指示されていないと判定された場合はＳ３０３へ進み、動画撮影の終了が指示されたと判定された場合は、図３の処理を終了する。 In S302, the control unit (not shown) of the processing apparatus 100 determines whether or not the end of moving image shooting has been instructed. When it is determined that the end of moving image shooting is not instructed, the process proceeds to S303, and when it is determined that the end of moving image shooting is instructed, the processing of FIG. 3 ends.

Ｓ３０３において、色変換部１０２は、撮像部１０１の撮像により得られたＹＵＶの映像信号から、ＲＧＢの映像信号へ色変換する。ＲＧＢの映像信号は、顔検出部１０３へ出力される。なお本形態において、ＹＵＶの映像信号、及びＲＧＢの映像信号は、共にデジタル映像信号である。また、本形態で入力される映像信号は、時間的に連続した映像信号である。 In step S <b> 303, the color conversion unit 102 performs color conversion from a YUV video signal obtained by imaging by the imaging unit 101 to an RGB video signal. The RGB video signal is output to the face detection unit 103. In this embodiment, the YUV video signal and the RGB video signal are both digital video signals. Also, the video signal input in this embodiment is a temporally continuous video signal.

Ｓ３０４において、顔検出部１０３は、ＲＧＢの映像信号から人物の顔画像の検出を行う。本形態の顔検出部１０３は、あらかじめ記憶した大量の顔画像（正解画像）と非顔画像（不正解画像）とを用いた学習に基づいて、映像信号から顔画像を検出する。顔検出部１０３は、検出された顔画像の領域に対応する矩形の４つの頂点の座標データとＲＧＢの映像信号を顔追尾部１０４へ出力する。３つの顔画像が検出された場合の各顔画像の領域の例を図４に示す。 In step S304, the face detection unit 103 detects a human face image from the RGB video signals. The face detection unit 103 according to the present embodiment detects a face image from a video signal based on learning using a large number of face images (correct images) and non-face images (incorrect images) stored in advance. The face detection unit 103 outputs the coordinate data of the four vertices of the rectangle corresponding to the detected face image region and the RGB video signal to the face tracking unit 104. FIG. 4 shows an example of the area of each face image when three face images are detected.

また、Ｓ３０４において、顔追尾部１０４は、顔検出部１０３からの座標データに基づいて、顔検出部１０３で検出された顔画像の追尾処理をする。顔検出部１０３の追尾処理により、直前のフレームと現在のフレームとでそれぞれ検出された顔画像のうち、同一人物の顔画像が対応付けられる。また、顔追尾部１０４は、対応付けた顔画像のサイズがフレーム毎に大きく変化しないように時間軸方向にスムージング処理をする。すなわち、顔追尾部１０４は、顔検出部１０３により出力された顔画像の座標データを修正する。さらに、顔追尾部１０４は、スムージング処理の結果を、顔画像のＩＤと対応付けて笑顔判定部１０５へ出力する。例えば、図４に示すように、３人の人物の顔画像（人物１、２、３）が顔検出部１０３により検出された場合、顔追尾部１０４は、人物１のＩＤと、人物１の矩形領域のスムージング処理後の座標データとを対応付けて出力する。同様に、顔追尾部１０４は、人物２のＩＤと、人物２の矩形領域のスムージング処理後の座標データとを対応付けると共に、人物３のＩＤと、人物３の矩形領域のスムージング処理後の座標データとを対応付けて出力する。 In step S <b> 304, the face tracking unit 104 performs tracking processing of the face image detected by the face detection unit 103 based on the coordinate data from the face detection unit 103. The face images of the same person among the face images detected in the immediately preceding frame and the current frame are associated by the tracking process of the face detection unit 103. Further, the face tracking unit 104 performs a smoothing process in the time axis direction so that the size of the associated face image does not change greatly for each frame. That is, the face tracking unit 104 corrects the coordinate data of the face image output from the face detection unit 103. Further, the face tracking unit 104 outputs the result of the smoothing process to the smile determination unit 105 in association with the ID of the face image. For example, as shown in FIG. 4, when face images (persons 1, 2, 3) of three persons are detected by the face detection unit 103, the face tracking unit 104 determines the ID of the person 1 and the person 1 The coordinate data after smoothing processing of the rectangular area is output in association with it. Similarly, the face tracking unit 104 associates the ID of the person 2 with the coordinate data after the smoothing process of the rectangular area of the person 2, and also coordinates the ID of the person 3 with the coordinate data after the smoothing process of the rectangular area of the person 3. Are output in association with each other.

このように、人物をＩＤで管理することにより、例えば、ユーザにより指定された１人又は複数の人物が笑顔になっている期間の動画ファイルを生成することや、撮影されている期間が長い人物が笑顔になっている期間の動画ファイルを生成するなどが可能になる。ユーザにより指定された人物が笑顔になっている期間の動画ファイルを生成する場合、処理装置１００が備える入力部や処理装置とネットワークを介して接続される端末装置からの指定情報に基づいて、人物を指定することが可能である。すなわち、処理装置１００は、検出された複数の人物から、笑顔度の判定に用いる人物を指定する指定部（不図示）を有する。また、ＩＤによって人物を分別することにより、笑顔の合成写真を作ることや、各人物の笑顔のアイコンを作るなども可能になる。 In this way, by managing a person by ID, for example, it is possible to generate a video file during a period when one or more persons specified by the user are smiling, or a person who has been photographed for a long time. For example, you can create a video file for the period when you are smiling. When generating a moving image file during a period in which the person designated by the user is smiling, the person is based on designation information from an input unit included in the processing device 100 or a terminal device connected to the processing device via a network. Can be specified. That is, the processing apparatus 100 includes a designation unit (not shown) that designates a person to be used for smile level determination from a plurality of detected persons. In addition, by separating the persons according to the ID, it is possible to create a composite photograph of a smile or to create a smile icon for each person.

また、Ｓ３０４（判定手順）において、笑顔判定部１０５は、顔追尾部１０４から出力された顔画像のＩＤと、座標データと、映像信号とに基づいて、各顔画像の笑顔度の判定を行う。本形態の笑顔判定部１０５は、座標データにより特定された顔画像から眉、眼、唇を抽出する。そして笑顔判定部１０５は、眉の外側の端が下がっているほど高い笑顔度であると判定する。また、笑顔判定部１０５は、眼の瞳孔が開いているほど高い笑顔度であると判定する。また、笑顔判定部１０５は、唇の両端が中央に対して上がっているほど高い笑顔度であると判定する。笑顔判定部１０５は、眉、眼、唇の状態に応じて、各顔画像の笑顔度を０〜１．４の範囲で判定する。笑顔判定部１０５による笑顔度の判定が完了すると、Ｓ３０５に進む。 In S304 (determination procedure), the smile determination unit 105 determines the smile level of each face image based on the face image ID, the coordinate data, and the video signal output from the face tracking unit 104. . The smile determination unit 105 of this embodiment extracts eyebrows, eyes, and lips from the face image specified by the coordinate data. The smile determination unit 105 determines that the smile level is higher as the outer edge of the eyebrows is lowered. The smile determination unit 105 determines that the smile level is higher as the pupil of the eye is opened. The smile determination unit 105 determines that the smile level is higher as both ends of the lips are raised with respect to the center. The smile determination unit 105 determines the smile level of each face image in the range of 0 to 1.4 according to the state of the eyebrows, eyes, and lips. When the determination of the smile level by the smile determination unit 105 is completed, the process proceeds to S305.

また、Ｓ３０５（比較手順）において、笑顔判定部１０５は、Ｓ３０４で判定された笑顔度と閾値との比較を行う。この比較により、笑顔度が閾値よりも高いと判定された場合、Ｓ３０６へ進み、動画ファイルの生成が開始され、笑顔度が閾値よりも高い状態が継続すると、動画ファイルイの生成も継続される。一方、笑顔度が閾値よりも高くないと判定された場合、Ｓ３０９へ進み、動画ファイルの生成中であった場合は動画ファイルの生成を終了してＳ３１１へ進み、動画亜フィルの生成中でなかった場合はＳ３０１に戻る。
ここで、笑顔度と動画ファイルの期間の関係について説明する。 In S305 (comparison procedure), the smile determination unit 105 compares the smile level determined in S304 with a threshold value. If it is determined by this comparison that the smile level is higher than the threshold value, the process proceeds to S306, and the generation of the moving image file is started. If the smile level is higher than the threshold value, the generation of the moving image file is also continued. . On the other hand, if it is determined that the smile level is not higher than the threshold value, the process proceeds to S309. If a moving image file is being generated, the moving image file generation is terminated and the process proceeds to S311. If yes, the process returns to S301.
Here, the relationship between the smile level and the period of the moving image file will be described.

まず第１のモードは、複数人の顔画像のそれぞれに対して笑顔度を判定し、判定された笑顔度の少なくとも１つが閾値を超えている期間の動画ファイルを生成するモードである。図２は、第１のモードにおいて、３人の顔画像が検出された場合の各顔画像の笑顔度の変化と、動画ファイルの期間の関係を示している。本形態では、顔画像の笑顔度と比較する閾値を０．６とする。図２に示すように、３人の人物のうちの１人（人物２）の笑顔度が０．６を超えたタイミングから動画ファイルの記録が開始され、すべての人物の笑顔度が０．６よりも低くなったタイミングで動画ファイルの記録を終了する。 First, the first mode is a mode in which a smile level is determined for each of a plurality of face images, and a moving image file in a period in which at least one of the determined smile levels exceeds a threshold value is generated. FIG. 2 shows the relationship between the change in the smile level of each face image and the period of the moving image file when three face images are detected in the first mode. In this embodiment, the threshold value to be compared with the smile level of the face image is set to 0.6. As shown in FIG. 2, recording of a moving image file is started from the timing when the smile degree of one of the three persons (person 2) exceeds 0.6, and the smile degrees of all persons are 0.6. The video file recording ends at a lower timing.

すなわち、スイッチ部１０９は、少なくとも１人の笑顔度が０．６を超えている期間のＹＵＶの映像信号をファイル生成部１１０へ渡し、どの人物の笑顔度も０．６を超えていない期間の映像信号をファイル生成部へ渡さない。このように、スイッチ部１０９は、笑顔判定部１０５からの笑顔度と閾値との比較結果に応じた制御信号に基づいて、撮像部１０１からのＹＵＶの映像信号の記録を制御する。言い換えると、笑顔判定部１０５は、映像信号から検出された複数の人物（例えば第１の人物、第２の人物）のうち、笑顔度が高い第１の人物の笑顔度と閾値とを比較する。そして、スイッチ部１０９は、笑顔度が高い第１の人物の笑顔度が閾値よりも高い期間の映像信号をファイル生成部１１０へ渡す。また、例えば、所定のサイズよりも大きい顔画像のうち、最も笑顔度の高い顔画像と閾値を比較するようにしても良い。 That is, the switch unit 109 passes the YUV video signal during a period in which at least one person's smile level exceeds 0.6 to the file generation unit 110, and the smile level of any person does not exceed 0.6. Do not pass the video signal to the file generator. As described above, the switch unit 109 controls the recording of the YUV video signal from the imaging unit 101 based on the control signal according to the comparison result between the smile level from the smile determination unit 105 and the threshold value. In other words, the smile determination unit 105 compares the smile level of a first person with a high smile level with a threshold among a plurality of persons (for example, the first person and the second person) detected from the video signal. . Then, the switch unit 109 passes the video signal during a period in which the smile level of the first person having a high smile level is higher than the threshold value to the file generation unit 110. In addition, for example, a threshold value may be compared with a facial image having the highest smile degree among facial images larger than a predetermined size.

第２のモードは、複数人の顔画像のそれぞれに対して笑顔度を判定し、判定された笑顔度の合計値が閾値を超えている期間の動画ファイルを生成するモードである。すなわち、笑顔判定部１０５は、検出された複数の人物の笑顔度の合計値と閾値とを比較する。そして、スイッチ部１０９は、複数の人物の笑顔度の合計値が閾値よりも高い期間の映像信号をファイル生成部１１０へ渡す。また、第２のモードにおいては、閾値の設定方法を、さらに２つに分けることができる。すなわち、顔画像の数に応じて閾値を変化させるモードと、顔画像の数に関わらず閾値を一定にするモードである。第２のモードによれば、第１のモードよりも全体の笑顔度を考慮した動画ファイルの期間の決定ができる。 The second mode is a mode in which a smile level is determined for each of a plurality of face images, and a moving image file is generated during a period in which the total value of the determined smile levels exceeds a threshold value. That is, the smile determination unit 105 compares the total value of the detected smile levels of a plurality of persons with a threshold value. Then, the switch unit 109 passes the video signal during a period in which the total smile level of the plurality of persons is higher than the threshold value to the file generation unit 110. In the second mode, the threshold setting method can be further divided into two. That is, a mode in which the threshold is changed according to the number of face images, and a mode in which the threshold is constant regardless of the number of face images. According to the second mode, it is possible to determine the period of the moving image file in consideration of the overall smile level as compared with the first mode.

また、第３のモードは、複数人の顔画像のそれぞれに対して笑顔度を判定し、判定された笑顔度の平均値が閾値を超えている期間の動画ファイルを生成するモードである。すなわち、笑顔判定部１０５は、検出された複数の人物の笑顔度の平均値と閾値とを比較する。そして、スイッチ部１０９は、複数の人物の笑顔度の平均値が閾値よりも高い期間の映像信号をファイル生成部１１０に渡す。第３のモードによれば、顔画像の数に応じて閾値を変動させるよりも、簡単な処理で動画ファイルの期間を決定できる。顔画像の数や、取得したい動画ファイルに応じて、上記の各モードをユーザが選択できるようにすることで、より満足度の高い動画ファイルを生成できるようになる。 The third mode is a mode in which a smile level is determined for each of a plurality of face images, and a moving image file is generated during a period in which the average value of the determined smile levels exceeds a threshold value. That is, the smile determination unit 105 compares the average value of smile levels of a plurality of detected persons with a threshold value. Then, the switch unit 109 passes the video signal during a period in which the average smile level of a plurality of persons is higher than the threshold value to the file generation unit 110. According to the third mode, the period of the moving image file can be determined by simple processing rather than changing the threshold according to the number of face images. By enabling the user to select each of the above modes according to the number of face images and the moving image file to be acquired, a moving image file with higher satisfaction can be generated.

また、上記の実施形態では、笑顔度が閾値を上回ってから動画ファイルの期間を開始する例を説明しているが、例えば、笑顔度が閾値を上回ったタイミングよりも前のタイミングから動画ファイルの期間が開始するようにしても良い。このようにすることで、笑顔度が上がった原因を含む動画ファイルを生成できる可能性が向上する。 In the above embodiment, an example is described in which the period of the video file is started after the smile level exceeds the threshold value. For example, the video file is started from a timing before the timing when the smile level exceeds the threshold level. The period may start. By doing so, the possibility of generating a moving image file including the cause of the increased smile level is improved.

また、上記の実施形態では、笑顔度が閾値を下回ると、ただちに動画ファイルの期間を終了する例を説明しているが、例えば、笑顔度が閾値を下回ってから所定時間が経過してから動画ファイルの期間を終了するようにしてもよい。このようにすることで、笑顔度が短期間だけ低くなってから、またすぐに笑顔度が高くなった場合に、複数の動画ファイルに分割されてしまう可能性を低減できる。 Further, in the above embodiment, an example is described in which the period of the video file is immediately ended when the smile level falls below the threshold value. For example, the video is moved after a predetermined time elapses after the smile level falls below the threshold level. The file period may be terminated. By doing so, it is possible to reduce the possibility of being divided into a plurality of moving image files when the smile level is lowered for a short period of time and then the smile level is increased immediately.

また、例えば、動画ファイルの期間が開始された後に、人物が横を向いたり、下を向いたりしたことによって、顔画像の特徴量の取得が中断されることにより、笑顔度が判定できなくなる場合が考えられる。このような場合、本形態の笑顔判定部１０５は、顔追尾によって画面内には存在することがわかっているにも関わらず、特徴量の取得が中断されている人物が存在する間は、動画ファイルの期間を終了しない。 In addition, for example, when the moving image file period is started and the person turns sideways or faces down, the face image feature amount acquisition is interrupted, so that the smile level cannot be determined. Can be considered. In such a case, the smile determination unit 105 according to the present exemplary embodiment does not include the moving image while there is a person whose feature amount acquisition is interrupted even though it is known to exist in the screen by face tracking. Do not end the file period.

すなわち、笑顔判定部１０５は、人物の笑顔度に応じて動画ファイルの期間が開始してから、検出中の当該人物の笑顔度を判定するための特徴量の取得が中断した場合、当該中断中の期間の映像信号を含む動画ファイルが生成されるように制御信号の送信を制御する。このようにすることで、実際には笑顔が続いているにも関わらず動画ファイルの期間が終了してしまうといった可能性を低減できる。また、例えば、複数の人物が検出されている場合、特徴量の取得が中断されている人物の笑顔度を、他の人物の笑顔度の平均値としても良い。 In other words, when the acquisition of the feature amount for determining the smile level of the detected person is interrupted after the period of the moving image file starts according to the smile level of the person, The transmission of the control signal is controlled so that a moving image file including the video signal of the period is generated. By doing so, it is possible to reduce the possibility that the period of the moving image file ends even though the smile is actually continued. Further, for example, when a plurality of persons are detected, the smile level of the person whose feature amount acquisition is interrupted may be set as the average value of smile levels of other persons.

また、上記の実施形態では、笑顔度と比較する閾値は、動画ファイルの期間の開始と終了とで同じ値（０．６）の例を説明しているが、例えば、動画ファイルの開始を決める閾値のほうが、動画ファイルの終了を決める閾値よりも高くなるようにしても良い。このようにすれば、笑顔になった原因を動画ファイルの期間に含められる可能性が向上する。また、例えば、動画ファイルの開始を決める閾値のほうが、動画ファイルの終了を決める閾値よりも低くなるようにしても良い。このようにすれば、笑顔が短期間だけ低くなってから、またすぐに笑顔度が高くなった場合に、複数の動画ファイルに分割されてしまう可能性を低減できる。 In the above embodiment, the threshold value to be compared with the smile degree is described as an example of the same value (0.6) at the start and end of the period of the video file. For example, the start of the video file is determined. The threshold value may be higher than the threshold value that determines the end of the moving image file. This improves the possibility that the cause of the smile can be included in the period of the moving image file. Further, for example, the threshold value for determining the start of the moving image file may be set lower than the threshold value for determining the end of the moving image file. In this way, the possibility of being divided into a plurality of moving image files can be reduced when the smile level decreases for a short period of time and then the smile level increases immediately.

すなわち、笑顔判定部１０５は、動画ファイルの期間を、笑顔度が第１の閾値に達したことに応じて開始させ、笑顔度が第１の閾値とは異なる第２の閾値に達したことに応じて終了させる。また、ファイル生成部１１０は、映像信号から検出された人物の笑顔度が第１の閾値に達してから、人物の笑顔度が第１の閾値とは異なる第２の閾値に達するまでの期間の映像信号から動画ファイルを生成する。このように、動画ファイルの期間の開始を決定するための閾値と、動画ファイルの期間の終了を決定するための閾値とを異なる値にすることで、笑顔度の数値のヒステリシスも考慮した動作が可能となる。 That is, the smile determination unit 105 starts the period of the moving image file in response to the smile level reaching the first threshold value, and the smile level has reached a second threshold value different from the first threshold value. Terminate accordingly. Further, the file generation unit 110 has a period from when the smile level of the person detected from the video signal reaches the first threshold value until the smile level of the person reaches a second threshold value different from the first threshold value. Generate a video file from the video signal. In this way, the threshold value for determining the start of the movie file period and the threshold value for determining the end of the movie file period are set to different values, so that the operation taking into account the hysteresis of the smile degree numerical value can be performed. It becomes possible.

図３のＳ３０７において、フレーム特定部１０６は、Ｓ３０６で記録された現在のフレームの笑顔度が、動画ファイルの期間の開始以降の他のフレームの笑顔度よりも高いか否かを判定する。そして、フレーム特定部１０６は、現在のフレームの笑顔度が、動画ファイルの開始以降のフレームの笑顔度の最高値よりも高いと判定した場合、当該フレームの特定情報をフレーム切り出し部１０７へ出力する。 In S307 of FIG. 3, the frame specifying unit 106 determines whether the smile level of the current frame recorded in S306 is higher than the smile level of other frames after the start of the moving image file period. If the frame identification unit 106 determines that the smile level of the current frame is higher than the highest smile level of the frames after the start of the moving image file, the frame identification unit 106 outputs the specific information of the frame to the frame cutout unit 107. .

Ｓ３０８において、フレーム特定部１０６から特定情報を受信したフレーム切り出し部１０７は、当該特定情報に対応するフレームのＹＵＶのデジタル映像信号を撮像部１０１から取得して一時記憶する。 In S <b> 308, the frame cutout unit 107 that has received the specific information from the frame specifying unit 106 acquires the YUV digital video signal of the frame corresponding to the specific information from the imaging unit 101 and temporarily stores it.

Ｓ３０５において、笑顔判定部１０５は、笑顔度が閾値よりも高くないと判定した場合、Ｓ３０９に進み、現在動画記録中であるか否かを判定する。動画記録中でないと判定された場合、Ｓ３０１に戻り、動画記録中であると判定された場合、Ｓ３１０に進む。 If the smile determination unit 105 determines in S305 that the smile level is not higher than the threshold value, the process proceeds to S309 and determines whether or not the moving image is currently being recorded. If it is determined that the moving image is not being recorded, the process returns to S301. If it is determined that the moving image is being recorded, the process proceeds to S310.

Ｓ３１０において、笑顔判定部１０５は、スイッチ部１０９に動画ファイルの期間の終了を示す制御信号を出力し、スイッチ部１０９は、映像信号のファイル生成部１１０への出力を終了する。 In S310, the smile determination unit 105 outputs a control signal indicating the end of the moving image file period to the switch unit 109, and the switch unit 109 ends the output of the video signal to the file generation unit 110.

Ｓ３１１において、フレーム切り出し部１０７は、フレーム特定部１０６から出力されたフレームの特定情報に対応するフレームのＹＵＶの映像信号を、サムネイル生成部１０８に渡す。そして、サムネイル生成部１０８は、フレーム切り出し部１０７から渡されたフレームの映像信号から、例えば、ＪＰＥＧなどのフォーマットに従ってサムネイル画像を生成する。サムネイル生成部１０８は、生成したサムネイル画像をファイル生成部１１０に出力する。 In step S <b> 311, the frame cutout unit 107 passes the YUV video signal of the frame corresponding to the frame identification information output from the frame identification unit 106 to the thumbnail generation unit 108. Then, the thumbnail generation unit 108 generates a thumbnail image from the video signal of the frame passed from the frame cutout unit 107 according to a format such as JPEG, for example. The thumbnail generation unit 108 outputs the generated thumbnail image to the file generation unit 110.

なお、図２に示すように、動画ファイルの期間の各フレームのうち、人物２が１．２の笑顔度と判定されているフレームが、最も笑顔度が高いフレームとして特定されている。本形態のサムネイル生成部１０８は、人物２が１．２の笑顔度であると判定されたフレームの映像信号からサムネイル画像を生成する。ただし、例えば複数人物の笑顔度の合計値や平均値が最も高いフレームからサムネイル画像を生成してもよい。 As shown in FIG. 2, the frame in which the person 2 is determined to have a smile level of 1.2 among the frames of the moving image file period is identified as the frame with the highest smile level. The thumbnail generation unit 108 of this embodiment generates a thumbnail image from the video signal of the frame in which the person 2 is determined to have a smile level of 1.2. However, for example, a thumbnail image may be generated from a frame having the highest total value or average value of smile levels of a plurality of persons.

Ｓ３１２（生成手順）において、ファイル生成部１１０は、スイッチ部１０９から渡された映像信号に基づく動画ファイルを生成する。例えば、ファイル生成部１１０は、スイッチ部１０９からのＹＵＶの映像信号をＨ．２６４へ、音声信号をＤｏｌｂｙ（登録商標）で圧縮処理し、多重化することにより、ＡＶＣＨＤフォーマットの動画ファイルを生成する。ファイル生成部１１０は、生成した動画ファイルと、サムネイル生成部１０８で生成されたサムネイル画像とを関連付けて出力する。 In S 312 (generation procedure), the file generation unit 110 generates a moving image file based on the video signal passed from the switch unit 109. For example, the file generation unit 110 converts the YUV video signal from the switch unit 109 to H.264. A video signal in AVCHD format is generated by compressing and multiplexing the audio signal to H.264 using Dolby (registered trademark). The file generation unit 110 associates and outputs the generated moving image file and the thumbnail image generated by the thumbnail generation unit 108.

以上説明したように、本形態の処理装置１００は、映像信号から検出された顔画像の笑顔度が閾値よりも高い期間を含む期間の映像信号から動画ファイルを生成する。このようにすることで、動画ファイルの作成者の負荷を低減しつつ、満足度の高い動画ファイルを生成できるようになる。 As described above, the processing apparatus 100 according to the present embodiment generates a moving image file from a video signal in a period including a period in which the smile level of a face image detected from the video signal is higher than a threshold value. By doing so, it is possible to generate a moving image file with a high degree of satisfaction while reducing the load on the creator of the moving image file.

なお、上記の実施形態では、笑顔度が閾値よりも高いと判定されたタイミングから、閾値以下になったと判定されたタイミングまでの期間を含む映像信号から動画ファイルを生成する例を中心に説明したが、この例に限らない。たとえば、笑顔度が閾値よりも高いと判定されたタイミングから、所定時間後（たとえば５秒後）までの期間の映像信号から動画ファイルを生成するようにしてもよい。この場合、ファイル生成部１１０は、判定された笑顔度が閾値よりも高くなってから所定時間後までの期間の映像信号から動画ファイルを生成する。 In the above-described embodiment, the description has focused on an example in which a moving image file is generated from a video signal including a period from a timing at which the smile degree is determined to be higher than the threshold to a timing at which the smile is determined to be lower than the threshold. However, it is not limited to this example. For example, a moving image file may be generated from a video signal in a period from a timing when it is determined that the smile level is higher than a threshold to a predetermined time (for example, after 5 seconds). In this case, the file generation unit 110 generates a moving image file from a video signal in a period from when the determined smile level is higher than the threshold to a predetermined time later.

また、図３のＳ３０６で動画記録が開始されてからＳ３０２で動画撮影終了と判定された場合、動画ファイルを生成してから処理を終了するようにしてもよい。この場合、サムネイルも生成される。 In addition, when it is determined in S302 that moving image recording has ended after moving image recording is started in S306 of FIG. 3, the processing may be ended after generating a moving image file. In this case, a thumbnail is also generated.

図５は、本実施形態の処理装置１００をホームネットワーク４００に接続した場合のネットワーク接続構成図である。図５では、処理装置１００をデジタルビデオカメラで実現する例について説明するが、これに限らず、動画を撮影可能なデジタルカメラ、ネットワークカメラ、携帯機器などでも実現可能である。また、本形態の処理装置１００は、無線ＬＡＮでホームネットワーク４００に接続する例を説明するが、有線接続であっても良い。メディアプレーヤー２００は、処理装置１００から配信された動画ファイルを受信し、復号化してテレビモニタ２０１へ出力する。ディスプレイ装置２０２は、メディアプレーヤー機能を搭載したディスプレイ装置で、例えば、デジタルフォトフレームである。ルーター３００は、無線アクセスポイント付きルーターで、処理装置１００との間で無線通信を行ない、ホームネットワーク４００との接続を仲介する。ルーター３００は、インターネット４０１へのゲートウエイの機能をも持つ。これら、処理装置１００、メディアプレーヤー２００、テレビモニタ２０１、ディスプレイ装置２０２の数は、図５の例のように限定されるものではなく、多数存在してもかまわない。また、ホームネットワーク４００に関しても、パケットデータを通すのに十分な帯域があるインターネットやイントラネットなどのネットワークでも良い。ホームネットワーク４００への物理的な接続形態として有線だけでなく無線の場合もありうるが、プロトコル的に接続されていれば、物理的な形態にこだわるものではない。また、メディアプレーヤー２００、テレビモニタ２０１、ディスプレイ装置２０２は同一の装置で実現することも可能である。 FIG. 5 is a network connection configuration diagram when the processing apparatus 100 of the present embodiment is connected to the home network 400. In FIG. 5, an example in which the processing device 100 is realized by a digital video camera will be described. However, the present invention is not limited to this. Further, although an example in which the processing apparatus 100 of this embodiment is connected to the home network 400 via a wireless LAN will be described, a wired connection may be used. The media player 200 receives the moving image file distributed from the processing device 100, decodes it, and outputs it to the television monitor 201. The display device 202 is a display device having a media player function, and is a digital photo frame, for example. The router 300 is a router with a wireless access point, and performs wireless communication with the processing apparatus 100 and mediates connection with the home network 400. The router 300 also has a gateway function to the Internet 401. The numbers of the processing device 100, the media player 200, the television monitor 201, and the display device 202 are not limited as in the example of FIG. 5, and a large number may exist. The home network 400 may also be a network such as the Internet or an intranet that has sufficient bandwidth to pass packet data. The physical connection form to the home network 400 may be not only wired but also wireless, but the physical form is not particular as long as it is connected in a protocol manner. Further, the media player 200, the television monitor 201, and the display device 202 can be realized by the same device.

上記の構成の場合、処理装置１００のファイル生成部１１０は、生成した動画ファイルのメタデータとして、サムネイル生成部１０８が生成したサムネイル画像を記録する。また、処理装置１００のファイル生成部１１０は、顔画像の人物認識の機能を持たせることで、動画ファイルのメタデータに、検出された顔画像に対応する人物名等を記録することも可能である。このようにすれば、メディアプレーヤー２００で再生する動画ファイルをより簡単に選択できるようになる。 In the case of the above configuration, the file generation unit 110 of the processing device 100 records the thumbnail image generated by the thumbnail generation unit 108 as metadata of the generated moving image file. In addition, the file generation unit 110 of the processing device 100 can record a person name corresponding to the detected face image in the metadata of the moving image file by providing a function of recognizing the person of the face image. is there. In this way, a moving image file to be played back by the media player 200 can be selected more easily.

例えば、処理装置１００は、ＵＰｎＰ−ＡＶ（ＵｎｉｖｅｒｓａｌＰｌｕｇａｎｄＰｌａｙＡｕｄｉｏＶｉｓｕａｌ）の機能であるコンテントディレクトリサービス（ＣＤＳ：ＣｏｎｔｅｎｔＤｉｒｅｃｔｏｒｙＳｅｒｖｉｃｅ）を利用することにより、例えば、“○○くんの笑顔映像”などのようなタイトルを含めたプレイリストを、予めメディアプレーヤー２００へ送信する。これにより、メディアプレーヤー２００のユーザは、ホームネットワーク４００を介して接続される処理装置１００に記録されている動画ファイルの内容を把握できる。 For example, the processing device 100 uses a content directory service (CDS: Content Directory Service) which is a function of UPnP-AV (Universal Plug and Play Audio Visual), for example, “smile video of OO”. A playlist including various titles is transmitted to the media player 200 in advance. Thereby, the user of the media player 200 can grasp the contents of the moving image file recorded in the processing device 100 connected via the home network 400.

また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

An input means for inputting a video signal;
Determination means for determining a smile degree from a feature amount of a person detected from the input video signal;
A comparing means for comparing the determined smile level with a threshold;
A processing apparatus comprising: generating means for generating a moving image file from a video signal in a period including a period in which the determined smile level is higher than the threshold among the input video signals.

The comparing means compares a total value of smile levels of a plurality of persons detected from the input video signal with the threshold;
The processing apparatus according to claim 1, wherein the generation unit generates a moving image file from a video signal in a period including a period in which a total value of smile levels of the plurality of persons is higher than the threshold value.

The comparing means compares an average value of smile levels of a plurality of persons detected from the input video signal with the threshold value,
The processing device according to claim 1, wherein the generation unit generates a moving image file including a video signal of a period in which an average value of smile levels of the plurality of persons is higher than the threshold value.

The comparing means compares the threshold of the first person and the second person detected from the input video signal with the threshold value of the first person who has a high smile degree. The processing apparatus according to claim 1.

Designating means for designating a person from a plurality of persons detected from the input video signal;
The processing apparatus according to claim 1, wherein the comparison unit compares a smile level of the person designated by the designation unit with the threshold value.

The generation means includes a period of time from when the smile level of the person detected from the video signal reaches a first threshold value until the smile level of the person reaches a second threshold value different from the first threshold value. The processing apparatus according to claim 1, wherein a moving image file is generated from the video signal.

A feature amount for determining the smile level of the person detected from the input video signal after starting a period for generating the video file according to the smile level of the detected person. The processing apparatus according to any one of claims 1 to 6, wherein when the acquisition is interrupted, the generation unit generates a moving image file including a video signal in the interrupted period.

The generation unit is configured to generate the moving image file based on a second moving image frame having a second smile degree that is higher than a first smiling degree of the first moving image frame in a period for generating the moving image file. The processing apparatus according to any one of claims 1 to 7, wherein a thumbnail image is generated.

The generation unit generates a moving image file from a video signal in a period from when the smile level reaches the threshold value until a predetermined time after the smile level becomes lower than the threshold value. The processing apparatus according to 1.

The processing apparatus according to claim 1, wherein the generation unit generates a moving image file from a video signal in a period from a time when the determined smile level is higher than the threshold to a predetermined time.

A method of generating a video file performed by a processing device,
An input process for inputting a video signal;
A determination step of determining a smile level from a feature amount of a person detected from the input video signal;
A comparison step of comparing the determined smile level with a threshold;
And a generating step of generating a moving image file from a video signal in a period including a period in which the determined smile level is higher than the threshold value in the input video signal.

On the computer,
Input procedure for inputting video signal,
A determination procedure for determining a smile degree from a feature amount of a person detected from the input video signal;
A comparison procedure for comparing the determined smile level with a threshold;
A program for executing a generation procedure for generating a moving image file from a video signal in a period including a period in which the determined smile level is higher than the threshold among the input video signals.