WO2012014590A1

WO2012014590A1 - Video processing device and method

Info

Publication number: WO2012014590A1
Application number: PCT/JP2011/063701
Authority: WO
Inventors: 誠倉橋; 佐藤　洋一
Original assignee: Pioneer Corp; University of Tokyo NUC
Current assignee: Pioneer Corp; University of Tokyo NUC
Priority date: 2010-07-28
Filing date: 2011-06-15
Publication date: 2012-02-02
Anticipated expiration: 2013-01-28

Abstract

The video processing device of the present invention is provided with face detection means (120) for detecting the face (210) of a person (200) from an image, range estimation means (130) for estimating a hand existence range (P1) in which a hand (220) of the person may exist on the basis of information of the detected face of the person, and hand detection means (140) for performing processing for detecting the hand of the person in the hand existence range. Accordingly, it is possible to detect a hand of a person within a video efficiently and with high precision.

Description

Video processing apparatus and method

　本発明は、例えば映像に対して各種処理を施すことにより、映像中に存在する所定のオブジェクトを検出する映像処理装置及び方法の技術分野に関する。 The present invention relates to a technical field of a video processing apparatus and method for detecting a predetermined object existing in a video by performing various processes on the video, for example.

　この種の映像処理装置として、例えば撮影された映像の中から人物等の所定のオブジェクトを検出するものがある。オブジェクトを検出する際には、検出対象であるオブジェクトに共通する局所的な画像特徴が利用される。例えば非特許文献１では、輝度の勾配方向のヒストグラムを特徴量として算出することでＳＶＭ(Support Vector Machine)を学習し、映像中から人物を検出するという技術が提案されている。 As this type of video processing device, for example, there is a device that detects a predetermined object such as a person from a captured video. When detecting an object, local image features common to the object to be detected are used. For example, Non-Patent Document 1 proposes a technique of learning SVM (Support Vector Vector Machine) by calculating a histogram in a luminance gradient direction as a feature quantity and detecting a person from a video.

Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2005Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2005

　しかしながら、上述したような技術では、オブジェクトを検出する際に、先ず画像全体をスキャンすることが求められる。このため、オブジェクトを検出するまでの時間や、検出処理による負荷の増大が避けられない。このような不都合は、例えば人物の顔のように、検出し易い（即ち、特徴が明確である）オブジェクトを検出する場合には、比較的少なく抑えることができる。一方で、例えば人物の手のような、検出し難い（例えば、検出に用いることができる特徴が少ない、或いは形状が一定ではなく状況によって大きく変化してしまう）オブジェクトを検出する場合には、比較的大きな問題となってしまう。即ち、上述した技術には、様々なオブジェクトを効率よく検出することが困難であるという技術的問題点がある。 However, in the technique as described above, when an object is detected, it is required to first scan the entire image. For this reason, an increase in the load due to the time until the object is detected and the detection process is unavoidable. Such an inconvenience can be suppressed to a relatively small degree when detecting an object that is easy to detect (that is, a feature is clear) such as a human face. On the other hand, when detecting objects such as human hands that are difficult to detect (for example, there are few features that can be used for detection or the shape is not constant and changes greatly depending on the situation) It becomes a big problem. That is, the above-described technique has a technical problem that it is difficult to efficiently detect various objects.

　本発明は、例えば上述した問題点に鑑みなされたものであり、映像中の人物の手を効率的に且つ精度よく検出することが可能な映像処理装置及び方法を提供することを課題とする。 The present invention has been made in view of, for example, the above-described problems, and an object of the present invention is to provide a video processing apparatus and method capable of efficiently and accurately detecting a human hand in a video.

　本発明の映像処理装置は上記課題を解決するために、画像から人物の顔を検出する顔検出手段と、検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定手段と、前記手存在範囲において、前記人物の手を検出する処理を行う手検出手段とを備える。 In order to solve the above-described problem, the video processing apparatus of the present invention detects a person's face from an image, and presence of the person's hand that can exist based on the detected face information of the person Range estimation means for estimating the range, and hand detection means for performing processing for detecting the hand of the person in the hand presence range.

　本発明の映像処理方法は上記課題を解決するために、画像から人物の顔を検出する顔検出工程と、検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定工程と、前記手存在範囲において、前記人物の手を検出する処理を行う手検出工程とを備える。 In order to solve the above-described problem, the video processing method of the present invention detects a person's face from an image, and the presence of the person's hand that can exist based on the detected face information of the person A range estimation step of estimating a range; and a hand detection step of performing a process of detecting the hand of the person in the hand presence range.

　本発明の作用及び利得は次に説明する発明を実施するための形態から明らかにされる。 The operation and gain of the present invention will be clarified from embodiments for carrying out the invention described below.

第１実施例に係る映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus which concerns on 1st Example. 第１実施例に係る映像処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the video processing apparatus which concerns on 1st Example. 入力画像の一例を示す平面図である。It is a top view which shows an example of an input image. 顔検出方法を概念的に示す平面図である。It is a top view which shows notionally the face detection method. 手存在範囲の推定方法を示すフローチャートである。It is a flowchart which shows the estimation method of a hand presence range. 手存在範囲の推定方法を概念的に示す平面図である。It is a top view which shows the estimation method of a hand presence range notionally. 検出しようとする手のスケール推定方法を示す概念図である。It is a conceptual diagram which shows the scale estimation method of the hand which is going to detect. 検出しようとする手の回転角推定方法を示す概念図である。It is a conceptual diagram which shows the rotation angle estimation method of the hand which it is going to detect. 第２実施例に係る映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus which concerns on 2nd Example. 第２実施例に係る映像処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the video processing apparatus which concerns on 2nd Example. 第３実施例に係る映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus which concerns on 3rd Example. 学習に用いる複数の画像を示す概念図である。It is a conceptual diagram which shows the some image used for learning. 学習に基づいて推定される手存在範囲を概念的に示す平面図である。It is a top view which shows notionally the hand presence range estimated based on learning.

　本実施形態に係る映像処理装置は、画像から人物の顔を検出する顔検出手段と、検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定手段と、前記手存在範囲において、前記人物の手を検出する処理を行う手検出手段とを備える。 The video processing apparatus according to the present embodiment includes a face detection unit that detects a person's face from an image, and a range for estimating a hand presence range in which the person's hand can exist based on the detected face information of the person. An estimation unit; and a hand detection unit that performs a process of detecting the hand of the person in the hand presence range.

　本実施形態に係る映像処理装置によれば、その動作時に、先ず映像を構成する画像から人物の顔が検出される。具体的には、例えば画像全体がスキャンされ、輝度等の条件が、人間の顔であると認識できるような所定の値となっている部分が顔として検出される。 According to the video processing apparatus according to the present embodiment, during the operation, a human face is first detected from the images constituting the video. Specifically, for example, the entire image is scanned, and a portion having a predetermined value such that the condition such as luminance can be recognized as a human face is detected as a face.

　続いて本実施形態では、検出された顔の情報に基づいて、人物の手が存在し得る手存在範囲が推定される。尚、ここでの「顔の情報」とは、検出された顔から読み取れる様々な情報を指しており、例えば顔の位置や大きさ、向き等が挙げられる。 Subsequently, in this embodiment, a hand presence range in which a human hand can exist is estimated based on the detected face information. Here, “face information” refers to various information that can be read from the detected face, and includes, for example, the position, size, orientation, and the like of the face.

　人間の手は、人体の構造上、顔から極めて遠く離れた位置には存在し得ない。即ち、一の人物の顔の位置が分かっているとすると、そこから腕が伸びる範囲にしか、一の人物の手は存在し得ない。従って、顔を検出することができれば、顔の情報に基づいて、手が存在し得る範囲を高い精度で推定することができる。 The human hand cannot exist at a position very far from the face due to the structure of the human body. That is, if the position of one person's face is known, the hand of one person can exist only within the range where the arm extends from there. Therefore, if the face can be detected, the range in which the hand can exist can be estimated with high accuracy based on the face information.

　手存在範囲が推定されると、手存在範囲において人物の手を検出する処理が行われる。即ち、画像中の手存在範囲においてのみ、人物の手を検出する処理が行われる。手を検出する処理としては、例えば予め用意されたテンプレート画像とのマッチングや、輝度勾配方向ヒストグラムを用いた方法、或いは肌色領域の境界線形状を利用した方法等が挙げられる。但し、このような検出方法はあくまで一例に過ぎず、本実施形態に係る手の検出方法については、何ら限定されるものではない。 When the hand presence range is estimated, processing for detecting a human hand in the hand presence range is performed. That is, processing for detecting a person's hand is performed only in the hand presence range in the image. Examples of the process for detecting a hand include matching with a template image prepared in advance, a method using a luminance gradient direction histogram, a method using a boundary line shape of a skin color region, and the like. However, such a detection method is merely an example, and the hand detection method according to the present embodiment is not limited at all.

　ここで仮に、上述した手存在範囲が推定されていないとすると、手を検出する処理を画像全体において行うことが求められてしまう。この場合、処理を行う範囲が広い分、手を検出するまでの時間や検出処理による負荷が増加してしまう。 Here, if the above-described hand presence range is not estimated, it is required to perform a hand detection process on the entire image. In this case, the time required to detect a hand and the load due to the detection process increase due to the wide processing range.

　しかるに本実施形態に係る映像処理装置では、上述したように、手を検出する処理は画像における手存在範囲においてのみ行われる。従って、極めて効率的に手を検出することが可能である。尚、本実施形態では、顔を検出せずに直接手を検出しようとする場合と比べると、顔を検出する処理の分、検出するまでの時間や検出処理による負荷が増加してしまうと考えられる。しかしながら、人物の顔の検出は、技術的に成熟しており、既に実用化され普及しているため、新たな負担を増やすことなく利用できる。このため、顔を検出する処理による影響は、ほとんど無視できるまでに小さい。 However, in the video processing apparatus according to the present embodiment, as described above, the process of detecting the hand is performed only in the hand presence range in the image. Therefore, it is possible to detect a hand very efficiently. In the present embodiment, compared to the case where a hand is directly detected without detecting a face, the time until detection and the load due to the detection process are increased for the process of detecting the face. It is done. However, detection of human faces is technically mature and has already been put into practical use and can be used without increasing a new burden. For this reason, the influence of the face detection process is almost negligible.

　本実施形態では更に、手存在範囲においてのみ手を検出する処理が行われるため、例えば手存在範囲の外に他の人物の手が存在していた場合等において、他の人物の手を、検出対象である人物の手として検出してしまうことを防止できる。即ち、誤検出を防止し、精度よく人物の手を検出することができる。 Further, in the present embodiment, since the process of detecting the hand only in the hand presence range is performed, for example, when the hand of another person exists outside the hand presence range, the hand of another person is detected. It can prevent detecting as a hand of the person who is the object. That is, it is possible to prevent erroneous detection and accurately detect a human hand.

　以上説明したように、本実施形態に係る映像処理装置によれば、映像中の人物の手を効率的に且つ精度よく検出することが可能である。 As described above, according to the video processing apparatus according to the present embodiment, it is possible to efficiently and accurately detect the hand of a person in the video.

　本実施形態に係る映像処理装置の一態様では、前記範囲推定手段は、前記人物の顔を中心とする円を、前記手存在範囲として推定する。 In one aspect of the video processing apparatus according to the present embodiment, the range estimation means estimates a circle centered on the person's face as the hand presence range.

　この態様によれば、手存在範囲が人物の顔を中心とする円として推定されるため、手存在範囲を推定するための処理を、極めて簡単なものとすることができる。尚、ここでの「人物の顔を中心とする円」とは、顔の完全に中央に位置する部分を中心とする円に限られるものではなく、中心付近に顔を含むような様々な円或いは楕円を含む広い概念である。 According to this aspect, since the hand presence range is estimated as a circle centered on the person's face, the process for estimating the hand presence range can be made extremely simple. The “circle around the face of a person” here is not limited to a circle centered on a portion located completely in the center of the face, but various circles including a face near the center. Or it is a wide concept including an ellipse.

　人物の手が存在し得る範囲は、その人物の腕が伸びる範囲である。そして、腕の支点となる部分（具体的には、肩付近）は、極めて顔に近い位置にあると考えられる。従って、例えば人物の腕の長さを半径とする円を手存在範囲として推定すれば、より効率的に手の検出を行うことができる。尚、腕の長さは、例えば検出された顔の大きさ等から推定できる。 The range in which a person's hand can exist is the range in which the person's arm extends. Then, it is considered that the portion serving as a fulcrum of the arm (specifically, near the shoulder) is located very close to the face. Therefore, for example, if a circle whose radius is the length of the person's arm is estimated as the hand presence range, the hand can be detected more efficiently. The length of the arm can be estimated from the size of the detected face, for example.

　本実施形態に係る映像処理装置の他の態様では、前記人物の顔及び前記人物の手の相対的な位置関係を学習する学習手段を更に備え、前記範囲推定手段は、学習された前記相対的な位置関係に基づいて、前記手存在範囲を推定する。 In another aspect of the video processing apparatus according to the present embodiment, the video processing apparatus further includes a learning unit that learns a relative positional relationship between the face of the person and the hand of the person, and the range estimation unit includes the learned relative The hand presence range is estimated based on the correct positional relationship.

　この態様によれば、手存在範囲を推定する前に、予め多くの画像を参照することにより、人物の顔及び人物の手の相対的な位置関係が学習される。尚、ここでの「相対的な位置関係」とは、人物の顔及び人物の手の単純な位置関係だけでなく、顔及び手の大きさや向きに関する情報も含んでいる。 According to this aspect, before estimating the hand presence range, the relative positional relationship of the person's face and the person's hand is learned by referring to many images in advance. Here, the “relative positional relationship” includes not only a simple positional relationship between a person's face and a person's hand, but also information regarding the size and orientation of the face and hand.

　上述した学習結果に基づいて手存在範囲を推定するようにすれば、手を検出する処理が行われる範囲が、より適切な範囲となる。従って、極めて効率的に手の検出を行うことが可能となる。また、手を検出する処理を行う際に、手が存在する確率が高いと考えられる範囲（即ち、相対的な位置関係の学習において、人物の手が高い頻度で検出された範囲）から順に処理を実行するようにすれば、より効率的に手の検出を行うことができる。 If the hand presence range is estimated based on the learning result described above, the range in which the process of detecting the hand is performed becomes a more appropriate range. Therefore, it is possible to detect the hand very efficiently. In addition, when performing a process of detecting a hand, processing is performed in order from a range in which the probability that a hand exists is high (that is, a range in which a human hand is detected with a high frequency in learning of a relative positional relationship). If this is executed, the hand can be detected more efficiently.

　本実施形態に係る映像処理装置の他の態様では、前記手検出手段は、前記人物の顔の情報に基づいて、前記人物の手の大きさを推定するスケール推定手段を有する。 In another aspect of the video processing apparatus according to the present embodiment, the hand detection unit includes a scale estimation unit that estimates the size of the person's hand based on information about the person's face.

　この態様によれば、顔が検出されると、検出された顔の情報に基づいて手の大きさが推定される。例えば、顔が大きく検出された場合には、検出される手も大きいであろうと推定される。他方で、顔が小さく検出された場合には、検出される手も小さいであろうと推定される。 According to this aspect, when a face is detected, the size of the hand is estimated based on the detected face information. For example, if the face is detected large, it is estimated that the detected hand will be large. On the other hand, if the face is detected small, it is estimated that the detected hand will be small.

　本態様では、検出されるであろう手の大きさが予め推定されているため、手を検出する処理を極めて効率的に且つ精度よく行うことができる。 In this aspect, since the size of the hand that will be detected is estimated in advance, the process of detecting the hand can be performed extremely efficiently and accurately.

　本実施形態に係る映像処理装置の他の態様では、前記手検出手段は、前記人物の顔の情報に基づいて、前記人物の手の回転角を推定する回転角推定手段を有する。 In another aspect of the video processing apparatus according to the present embodiment, the hand detection unit includes a rotation angle estimation unit that estimates a rotation angle of the person's hand based on information on the person's face.

　この態様によれば、顔が検出されると、検出された顔の情報に基づいて手の回転角（即ち、手の傾き）が推定される。例えば、手の回転角は、検出された顔の回転角に対して所定の範囲内であると推定される。即ち、顔の回転角及び手の回転角が比較的近い値となることを利用して、手の回転角を推定できる。 According to this aspect, when a face is detected, the rotation angle of the hand (that is, the hand inclination) is estimated based on the detected face information. For example, the hand rotation angle is estimated to be within a predetermined range with respect to the detected face rotation angle. That is, the hand rotation angle can be estimated by utilizing the fact that the face rotation angle and the hand rotation angle are relatively close to each other.

　本態様では、検出されるであろう手の回転角が予め推定されているため、手を検出する処理を極めて効率的に且つ精度よく行うことができる。 In this aspect, since the rotation angle of the hand that will be detected is estimated in advance, the process of detecting the hand can be performed very efficiently and accurately.

　本実施形態に係る映像処理装置の他の態様では、記手検出手段は、前記画像から肌色の領域を検出する肌色領域検出手段を有しており、前記手存在範囲における前記肌色の領域において、前記人物の手を検出する。 In another aspect of the video processing apparatus according to the present embodiment, the scribe detection means includes a skin color area detection means for detecting a skin color area from the image, and in the skin color area in the hand presence range, The person's hand is detected.

　この態様によれば、手存在範囲が推定されると、手を検出する処理が行われる前に、画像から肌色の領域が検出される。肌色の領域は、例えば予め設定された肌色の色成分に基づいて検出される。或いは、検出された顔の色に応じて検出されてもよい。 According to this aspect, when the hand presence range is estimated, the skin color region is detected from the image before the processing for detecting the hand is performed. The skin color region is detected based on, for example, a preset skin color component. Or you may detect according to the color of the detected face.

　肌色の領域が検出されると、手存在範囲における肌色の領域について、手を検出する処理が行われる。即ち、手存在範囲であっても、肌色でない領域については手を検出する処理は行われず、手存在範囲の中の肌色の領域においてのみ、手を検出する処理が行われる。 When the skin color area is detected, a process for detecting the hand is performed on the skin color area in the hand presence range. That is, even in the hand presence range, the process for detecting the hand is not performed for the non-skin color area, and the process for detecting the hand is performed only in the skin color area in the hand presence range.

　人物の手の色は、例えば手袋等をしない限り肌色であると考えられる。このため、肌色の領域を検出しておけば、手を検出する処理を行う範囲をより狭めることができる。従って、本態様では、極めて効率的に手の検出を行うことが可能である。 The color of a person's hand is considered to be a skin color unless gloves are worn, for example. For this reason, if the area | region of a skin color is detected, the range which performs the process which detects a hand can be narrowed more. Therefore, in this aspect, it is possible to detect a hand very efficiently.

　本実施形態に係る映像処理装置の他の態様では、前記手検出手段は、過去の前記画像及び現在の前記画像を比較して、輝度の差分が所定値以上となる差分領域を検出する差分領域検出手段を有しており、前記手存在範囲における前記差分領域において、前記人物の手を検出する。 In another aspect of the video processing apparatus according to the present embodiment, the hand detection unit compares the past image and the current image, and detects a difference region in which a difference in luminance is a predetermined value or more. A detection unit is included, and the person's hand is detected in the difference area in the hand presence range.

　この態様によれば、手存在範囲が推定されると、手を検出する処理が行われる前に差分領域が検出される。差分領域は、過去の画像及び現在の画像を比較して、輝度の差分が所定値以上となる領域である。尚、ここでの「所定値」とは、手が存在している部分と存在していない部分との輝度の差に応じて設定される値である。 According to this aspect, when the hand presence range is estimated, the difference area is detected before the process of detecting the hand is performed. The difference area is an area where a difference in luminance is equal to or greater than a predetermined value by comparing a past image and a current image. Here, the “predetermined value” is a value set according to the difference in luminance between the portion where the hand is present and the portion where the hand is not present.

　差分領域が検出されると、手存在範囲における差分領域について、手を検出する処理が行われる。即ち、手存在範囲の中の差分領域においてのみ、手を検出する処理が行われる。即ち、手存在範囲であっても、差分領域でない領域については手を検出する処理は行われず、手存在範囲の中の差分領域においてのみ、手を検出する処理が行われる。 When the difference area is detected, a process for detecting a hand is performed for the difference area in the hand presence range. That is, the process of detecting the hand is performed only in the difference area within the hand presence range. That is, even if it is the hand presence range, the processing for detecting the hand is not performed for the region that is not the difference region, and the processing for detecting the hand is performed only in the difference region within the hand presence range.

　人物の手は、画像中の他の物体と比べると、動きが大きいと考えられる。このため、過去の画像及び現在の画像では、手の位置が異なっている可能性が高い。よって、輝度の差分が所定値以上となる領域（即ち、輝度が大きく変化した領域）に、手が存在している可能性が高いと推定される。このため、差分領域を検出しておけば、手を検出する処理を行う範囲をより狭めることができる。従って、本態様では、極めて効率的に手の検出を行うことが可能である。 A person's hand is considered to move much more than other objects in the image. For this reason, there is a high possibility that the position of the hand is different between the past image and the current image. Therefore, it is estimated that there is a high possibility that a hand is present in an area where the luminance difference is equal to or greater than a predetermined value (that is, an area where the luminance has greatly changed). For this reason, if the difference area is detected, the range for performing the hand detection process can be further narrowed. Therefore, in this aspect, it is possible to detect a hand very efficiently.

　本実施形態に係る映像処理方法は、画像から人物の顔を検出する顔検出工程と、検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定工程と、前記手存在範囲において、前記人物の手を検出する処理を行う手検出工程とを備える。 The video processing method according to the present embodiment includes a face detection step for detecting a person's face from an image, and a range for estimating a hand presence range in which the person's hand may exist based on the detected face information of the person. An estimation step, and a hand detection step for performing processing for detecting the hand of the person in the hand presence range.

　本実施形態に係る映像処理方法によれば、上述した本発明の映像処理装置の場合と同様に、先ず人物の顔が検出され、検出された顔の情報に基づいて推定された手存在範囲において手を検出する処理が行われる。従って、映像中の人物の手を効率的に且つ精度よく検出することが可能である。 According to the video processing method according to the present embodiment, as in the case of the video processing device of the present invention described above, first, the face of a person is detected, and in the hand presence range estimated based on the detected face information. Processing to detect the hand is performed. Therefore, it is possible to efficiently and accurately detect the hand of a person in the video.

　尚、本発明の映像処理方法においても、上述した本発明の映像処理装置における各種態様と同様の各種態様を採ることが可能である。 Note that the video processing method of the present invention can also adopt various aspects similar to the various aspects of the video processing apparatus of the present invention described above.

　本発明の映像処理装置及び方法に係る各種実施形態の作用及び他の利得については、以下に示す実施例において、より詳細に説明する。 The operation and other gains of various embodiments according to the video processing apparatus and method of the present invention will be described in more detail in the following examples.

　以下では、図面を参照して、本発明の実施例について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

　＜第１実施例＞
　先ず、第１実施例に係る映像処理装置の構成について、図１を参照して説明する。ここに図１は、第１実施例に係る映像処理装置の構成を示すブロック図である。 <First embodiment>
First, the configuration of the video processing apparatus according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of the video processing apparatus according to the first embodiment.

　図１において、第１実施例に係る映像処理装置は、映像入力部１１０と、顔検出部１２０と、手存在範囲推定部１３０と、手検出部１４０と、出力部１５０とを備えて構成されている。 1, the video processing apparatus according to the first embodiment includes a video input unit 110, a face detection unit 120, a hand presence range estimation unit 130, a hand detection unit 140, and an output unit 150. ing.

　映像入力部１１０は、映像処理装置に対して、検出対象となる映像を装置外部から入力する。映像は、例えば複数の連続する画像として入力される。 The video input unit 110 inputs a video to be detected from the outside of the apparatus to the video processing apparatus. The video is input as, for example, a plurality of continuous images.

　顔検出部１２０は、本発明の「顔検出手段」の一例であり、入力された画像をスキャンすることで、画像中に写っている人物の顔を検出する。 The face detection unit 120 is an example of the “face detection unit” of the present invention, and scans an input image to detect the face of a person in the image.

　手存在範囲推定部１３０は、本発明の「範囲推定手段」の一例であり、検出された顔の情報（例えば、顔の位置や大きさ、向き等）に基づいて、手存在範囲（即ち、人物の手が存在し得る範囲）を推定する。 The hand presence range estimation unit 130 is an example of the “range estimation unit” of the present invention, and based on the detected face information (for example, the position, size, orientation, etc. of the face) The range in which a person's hand can exist is estimated.

　手検出部１４０は、本発明の「手検出手段」の一例であり、推定された手存在範囲において手を検出する処理を行い、人物の手を検出する。 The hand detection unit 140 is an example of the “hand detection unit” of the present invention, and performs a process of detecting a hand in the estimated hand presence range to detect a person's hand.

　出力部１５０は、検出された人物の手の情報を出力する。ここで出力される手の情報は、例えば手の挙動に応じた他の装置（例えば、テレビ、エアコン、照明器具、パソコン、ゲーム機、携帯機器、券売機、デジタルサイネージ等の公共空間の情報端末、住宅設備、自動車の車載機器等）の操作等に用いられる。 The output unit 150 outputs information on the detected human hand. The hand information output here is, for example, an information terminal in a public space such as another device (for example, a television, an air conditioner, a lighting device, a personal computer, a game machine, a portable device, a ticket machine, or a digital signage) according to the behavior of the hand. , Housing equipment, automotive in-vehicle equipment, etc.).

　次に、第１実施例に係る映像処理装置の動作について、図１に加えて図２から図８を参照して説明する。ここに図２は、第１実施例に係る映像処理装置の動作を示すフローチャートである。また図３は、入力画像の一例を示す平面図であり、図４は、顔検出方法を概念的に示す平面図である。 Next, the operation of the video processing apparatus according to the first embodiment will be described with reference to FIGS. 2 to 8 in addition to FIG. FIG. 2 is a flowchart showing the operation of the video processing apparatus according to the first embodiment. FIG. 3 is a plan view showing an example of the input image, and FIG. 4 is a plan view conceptually showing the face detection method.

　図２において、第１実施例に係る映像処理装置の動作時には、先ず映像入力部１１０が、映像を構成する画像を取得する（ステップＳ０１）。以下では、図３に示すような、人物２００の顔２１０及び手２２０が写っている画像が取得されたとして説明を進める。 In FIG. 2, during the operation of the video processing apparatus according to the first embodiment, the video input unit 110 first acquires an image constituting the video (step S01). In the following description, it is assumed that an image showing the face 210 and the hand 220 of the person 200 is acquired as shown in FIG.

　画像が取得されると、顔検出部１２０は、画像中の人物２００の顔２１０を検出する（ステップＳ０２）。例えば顔検出部１２０は、取得された画像全体をスキャンする。そして、輝度等の条件が、人間の顔であると認識できるような所定の値となっている部分を顔として検出する。 When the image is acquired, the face detection unit 120 detects the face 210 of the person 200 in the image (step S02). For example, the face detection unit 120 scans the entire acquired image. Then, a part having a predetermined value such that the condition such as luminance can be recognized as a human face is detected as a face.

　図４において、顔２１０の検出は、顔２１０を囲むような矩形を描くように行われる。この際、顔２１０を囲む矩形の左上の点の座標（ｘ，ｙ）、並びに顔２１０の幅ｗ及び高さｈの値がそれぞれ取得される。尚、このようなパラメータ以外に、例えば顔の向き等の様々な情報が検出されてもよい。 4, the face 210 is detected so as to draw a rectangle surrounding the face 210. At this time, the coordinates (x, y) of the upper left point of the rectangle surrounding the face 210 and the values of the width w and the height h of the face 210 are acquired. In addition to such parameters, various information such as the orientation of the face may be detected.

　図２に戻り、顔２１０が検出されると、検出された顔２１０の情報に基づいて、手存在範囲が推定される。以下では、手存在範囲の推定方法について、図５から図８を用いて詳細に説明する。ここに図５は、手存在範囲の推定方法を示すフローチャートであり、図６は、手存在範囲の推定方法を概念的に示す平面図である。また図７は、検出しようとする手のスケール推定方法を示す概念図であり、図８は、検出しようとする手の回転角推定方法を示す概念図である。 2, when the face 210 is detected, the hand presence range is estimated based on the information of the detected face 210. Hereinafter, the hand presence range estimation method will be described in detail with reference to FIGS. FIG. 5 is a flowchart showing the hand presence range estimation method, and FIG. 6 is a plan view conceptually showing the hand presence range estimation method. FIG. 7 is a conceptual diagram showing a method for estimating the scale of the hand to be detected, and FIG. 8 is a conceptual diagram showing a method for estimating the rotation angle of the hand to be detected.

　図５において、手存在範囲推定部１３０は、検出された顔を中心とした円を、手存在範囲と推定する（ステップＳ１１）。具体的には、図６に示すような円Ｐ１が手存在範囲として推定される。 In FIG. 5, the hand presence range estimation unit 130 estimates a circle centered on the detected face as the hand presence range (step S11). Specifically, a circle P1 as shown in FIG. 6 is estimated as the hand presence range.

　人間の手は、人体の構造上、顔から極めて遠く離れた位置には存在し得ない。即ち、人物の顔の位置が分かっているとすると、そこから腕が伸びる範囲にしか手は存在し得ない。従って、例えば人物の腕の長さを半径とする円を手存在範囲として推定すれば、より効率的に手の検出を行うことができる。尚、ここでの円Ｐ１の半径は、検出された顔２１０の大きさに応じて決定されている。より具体的には、顔２１０の高さｈの２倍を円Ｐ１の半径としている。 The human hand cannot exist at a position very far from the face due to the structure of the human body. That is, if the position of the person's face is known, the hand can only exist within the range where the arm extends from there. Therefore, for example, if a circle whose radius is the length of the person's arm is estimated as the hand presence range, the hand can be detected more efficiently. Here, the radius of the circle P1 is determined according to the size of the detected face 210. More specifically, twice the height h of the face 210 is the radius of the circle P1.

　手存在範囲が推定されると、検出された顔２１０の大きさに基づいて、検出されるであろう手のスケールを推定する（ステップＳ１２）。手のスケールは、例えば検出された顔２１０の高さｈの０．７倍～１．５倍の範囲であると推定される。手のスケールが推定されると、例えば図７に示すように、５段階のテンプレート（即ち、ア～オ）が設定される。 When the hand presence range is estimated, the scale of the hand that will be detected is estimated based on the size of the detected face 210 (step S12). The hand scale is estimated to be in the range of 0.7 to 1.5 times the height h of the detected face 210, for example. When the hand scale is estimated, for example, as shown in FIG. 7, a five-step template (ie, A to O) is set.

　続いて、検出された顔２１０の回転角に基づいて、検出されるであろう手の回転角を推定する（ステップＳ１３）。手の回転角は、例えば検出された顔２１０の回転角に対して、＋４５度～－４５度の範囲であると推定される。手の回転角が推定されると、例えば図８に示すように、７段階のテンプレート（即ち、い～と）が設定される。 Subsequently, based on the detected rotation angle of the face 210, the rotation angle of the hand that will be detected is estimated (step S13). The rotation angle of the hand is estimated to be in the range of +45 degrees to −45 degrees with respect to the detected rotation angle of the face 210, for example. When the rotation angle of the hand is estimated, for example, as shown in FIG. 8, a seven-step template (ie, “i”) is set.

　手のスケール及び回転角が推定されると、画像中の全ての顔についての処理（即ち、ステップＳ１１からステップＳ１３までの処理）が終了したか否かを判定する（ステップＳ１４）。ここで、全ての顔についての処理が終了していないと判定されると（ステップＳ１４：ＮＯ）、処理が終了していない顔について、ステップＳ１１からステップＳ１３の処理が再び行われる。一方で、全ての顔についての処理が終了したと判定されると（ステップＳ１４：ＹＥＳ）、実際に手検出を行う範囲、スケール、回転角の組み合わせを設定する（ステップＳ１６）。即ち、ステップＳ１１からステップＳ１３の処理において推定されたパラメータの各々が互いに組み合わせられ、実際に手の検出を行う条件が設定される。 When the hand scale and the rotation angle are estimated, it is determined whether or not the processing for all the faces in the image (that is, the processing from step S11 to step S13) has been completed (step S14). If it is determined that the processes for all the faces have not been completed (step S14: NO), the processes from step S11 to step S13 are performed again for the faces for which the process has not been completed. On the other hand, if it is determined that the processing for all the faces has been completed (step S14: YES), a combination of a range for actually performing hand detection, a scale, and a rotation angle is set (step S16). That is, the parameters estimated in the processing from step S11 to step S13 are combined with each other, and a condition for actually detecting the hand is set.

　図２に戻り、以上のように手の検出を行う条件が設定されると、手検出部１４０において手の検出処理が行われる（ステップＳ０４）。手検出部１４０は、予め用意されたテンプレート画像（図７及び図８参照）とのマッチングによって、画像中の手２２０を検出する。尚、輝度勾配方向ヒストグラムを用いた方法、或いは肌色領域の境界線形状を利用した方法等を用いて、手の検出を行うことも可能である。 Returning to FIG. 2, when the condition for detecting the hand is set as described above, the hand detecting process is performed in the hand detecting unit 140 (step S04). The hand detection unit 140 detects the hand 220 in the image by matching with a template image prepared in advance (see FIGS. 7 and 8). Note that it is also possible to perform hand detection using a method using a luminance gradient direction histogram or a method using a boundary line shape of a skin color region.

　ここで仮に、上述した手存在範囲（図６参照）が推定されていないとすると、手２２０を検出する処理を画像全体において行うことが求められてしまう。この場合、処理を行う範囲が広い分、手２２０を検出するまでの時間や検出処理による負荷が増加してしまう。 Here, if the above-described hand presence range (see FIG. 6) is not estimated, it is required to perform processing for detecting the hand 220 on the entire image. In this case, the time until the hand 220 is detected and the load due to the detection process increase due to the wide processing range.

　これに対し、本実施例に係る映像処理装置では、手２２０を検出する処理は、画像における手存在範囲においてのみ行われる。従って、極めて効率的に手を検出することが可能である。更に、手存在範囲においてのみ手を検出する処理が行われるため、例えば手存在範囲の外に他の人物の手が存在していた場合等において、他の人物の手を、検出対象である人物の手として検出してしまうことを防止できる。即ち、誤検出を防止し、精度よく人物の手を検出することができる。 On the other hand, in the video processing apparatus according to the present embodiment, the process of detecting the hand 220 is performed only in the hand presence range in the image. Therefore, it is possible to detect a hand very efficiently. Furthermore, since the process of detecting the hand only in the hand presence range is performed, for example, when the hand of another person exists outside the hand presence range, the hand of the other person is detected. Can be prevented from being detected as a hand. That is, it is possible to prevent erroneous detection and accurately detect a human hand.

　手２２０の検出が終了すると、出力部１５０は、検出した手の情報を結果として出力する（ステップＳ０５）。出力部１５０は、例えば検出した手２２０の位置、スケール、回転角を出力する。尚、手２２０が複数検出された場合には、全ての手２２０について結果を出力する。 When the detection of the hand 220 is completed, the output unit 150 outputs the detected hand information as a result (step S05). The output unit 150 outputs, for example, the detected position, scale, and rotation angle of the hand 220. If a plurality of hands 220 are detected, the results are output for all hands 220.

　以上説明したように、第１実施例に係る映像処理装置によれば、検出された顔の情報に基づいて手存在範囲が推定される。従って、映像中の人物の手を効率的に且つ精度よく検出することが可能である。 As described above, according to the video processing apparatus according to the first embodiment, the hand presence range is estimated based on the detected face information. Therefore, it is possible to efficiently and accurately detect the hand of a person in the video.

　＜第２実施例＞
　次に、第２実施形態に係る映像処理装置について、図９及び図１０を参照して説明する。ここに図９は、第２実施例に係る映像処理装置の構成を示すブロック図である。また図１０は、第２実施例に係る映像処理装置の動作を示すフローチャートである。尚、第２実施例は、上述の第１実施例と比べて、一部の構成及び動作が異なり、その他の部分については概ね同様である。このため、以下の説明では、第１実施例と異なる部分について詳細に述べ、その他の重複する部分については適宜説明を省略するものとする。 <Second embodiment>
Next, a video processing apparatus according to the second embodiment will be described with reference to FIGS. FIG. 9 is a block diagram showing the configuration of the video processing apparatus according to the second embodiment. FIG. 10 is a flowchart showing the operation of the video processing apparatus according to the second embodiment. The second embodiment differs from the first embodiment described above in part of the configuration and operation, and the other portions are substantially the same. For this reason, in the following description, a different part from 1st Example is described in detail, and description is abbreviate | omitted suitably about another overlapping part.

　図９において、第２実施例に係る映像処理装置は、第１実施例と同様の構成に加えて、肌色領域検出部１６０と、差分領域検出部１７０と、メモリ１７５とを備えて構成されている。 In FIG. 9, the video processing apparatus according to the second embodiment is configured to include a skin color area detection unit 160, a difference area detection unit 170, and a memory 175 in addition to the same configuration as that of the first embodiment. Yes.

　肌色領域検出部１６０は、本発明の「肌色領域検出手段」の一例であり、画像中の肌色の領域を検出する。肌色の領域は、例えば予め設定された肌色の色成分に基づいて検出される。或いは、検出された顔２１０の色に応じて検出されてもよい。 The skin color area detection unit 160 is an example of the “skin color area detection unit” of the present invention, and detects a skin color area in an image. The skin color region is detected based on, for example, a preset skin color component. Or you may detect according to the color of the detected face 210. FIG.

　差分領域検出部１７０は、本発明の「差分領域検出手段」の一例であり、メモリ１７５に記憶された過去の画像（例えば、１フレーム前の画像）と現在の画像との輝度の差分が、所定値以上となる差分領域を検出する。 The difference area detection unit 170 is an example of the “difference area detection unit” of the present invention, and the difference in luminance between the past image (for example, the image one frame before) stored in the memory 175 and the current image is A difference area that is equal to or greater than a predetermined value is detected.

　図１０において、第２実施例に係る映像処理装置の動作時には、先ず映像入力部１１０が、映像を構成する画像を取得する（ステップＳ２１）。画像が取得されると、顔検出部１２０は、画像中の人物２００の顔２１０を検出する（ステップＳ２２）。即ち、第１実施例と同様の処理が行われる。 In FIG. 10, during the operation of the video processing apparatus according to the second embodiment, first, the video input unit 110 acquires an image constituting the video (step S21). When the image is acquired, the face detection unit 120 detects the face 210 of the person 200 in the image (step S22). That is, the same processing as in the first embodiment is performed.

　ここで第２実施例では特に、肌色領域検出部１６０が、画像中の肌色領域を検出する（ステップＳ２３）。また、差分領域検出部１７０が、メモリ１７５に記憶された過去の画像を用いて、画像中の差分領域を検出する（ステップＳ２４）。 Here, particularly in the second embodiment, the skin color area detection unit 160 detects a skin color area in the image (step S23). Further, the difference area detection unit 170 detects a difference area in the image using the past image stored in the memory 175 (step S24).

　肌色領域及び差分領域が検出されると、図６に示したように、検出された顔２１０を中心とした円Ｐ１が手存在範囲として推定される（ステップＳ２５）。但し、ここでの手存在推定範囲からは、肌色領域でない領域と、差分領域でない領域が除かれる。即ち、手存在範囲は、顔２１０を中心とした円Ｐ１の範囲内であって、肌色領域且つ差分領域である領域となる。 When the skin color area and the difference area are detected, as shown in FIG. 6, the circle P1 centered on the detected face 210 is estimated as the hand presence range (step S25). However, from the hand presence estimation range here, an area that is not a skin color area and an area that is not a difference area are excluded. That is, the hand presence range is within the circle P1 centering on the face 210, and is a skin color region and a difference region.

　手２２０の色は、例えば手袋等をしない限り肌色であると考えられる。また画像中の手２２０は、他の物体と比べると動きが大きいと考えられる。このため、過去の画像及び現在の画像では、手２００の位置が異なっている可能性が高い。よって、輝度の差分が所定値以上となる領域（即ち、輝度が大きく変化した領域）に、手２２０が存在している可能性が高いと推定される。以上の結果、肌色領域及び差分領域を検出しておけば、手を検出する処理を行う範囲（即ち、手存在範囲）をより狭めることができる。 The color of the hand 220 is considered to be a skin color unless, for example, gloves are used. In addition, the hand 220 in the image is considered to move more than other objects. For this reason, there is a high possibility that the position of the hand 200 is different between the past image and the current image. Therefore, it is estimated that there is a high possibility that the hand 220 is present in an area where the luminance difference is equal to or greater than a predetermined value (that is, an area where the luminance has greatly changed). As a result, if the skin color area and the difference area are detected, the range in which the hand detection process is performed (that is, the hand presence range) can be further narrowed.

　手存在範囲が推定されると、手検出部１４０において手の検出処理が行われる（ステップＳ２６）。そして、出力部１５０は、検出した手の情報を結果として出力する（ステップＳ２７）。 When the hand presence range is estimated, hand detection processing is performed in the hand detection unit 140 (step S26). Then, the output unit 150 outputs the detected hand information as a result (step S27).

　以上説明したように、第２実施例に係る映像処理装置によれば、肌色領域及び差分領域に基づいて手存在範囲が推定される。従って、映像中の人物の手をより効率的に且つ精度よく検出することが可能である。 As described above, according to the video processing apparatus according to the second embodiment, the hand presence range is estimated based on the skin color area and the difference area. Therefore, it is possible to detect the hand of a person in the video more efficiently and accurately.

　＜第３実施例＞
　次に、第３実施形態に係る映像処理装置について、図１１から図１３を参照して説明する。ここに図１１は、第３実施例に係る映像処理装置の構成を示すブロック図である。また図１２は、学習に用いる複数の画像を示す概念図であり、図１３は、学習に基づいて推定される手存在範囲を概念的に示す平面図である。尚、第３実施例は、上述の第１及び第２実施例と比べて、一部の構成及び動作が異なり、その他の部分については概ね同様である。このため、以下の説明では、第１及び第２実施例と異なる部分について詳細に述べ、その他の重複する部分については適宜説明を省略するものとする。 <Third embodiment>
Next, a video processing apparatus according to the third embodiment will be described with reference to FIGS. FIG. 11 is a block diagram showing the configuration of the video processing apparatus according to the third embodiment. FIG. 12 is a conceptual diagram showing a plurality of images used for learning, and FIG. 13 is a plan view conceptually showing a hand presence range estimated based on learning. The third embodiment differs from the first and second embodiments described above in part of the configuration and operation, and the other portions are substantially the same. For this reason, in the following description, a different part from 1st and 2nd Example is described in detail, and description is abbreviate | omitted suitably about another overlapping part.

　図１１において、第３実施例に係る映像処理装置は、第１実施例と同様の構成に加えて、学習部１８０と、学習用記憶部１８５とを備えて構成されている。 In FIG. 11, the video processing apparatus according to the third embodiment includes a learning unit 180 and a learning storage unit 185 in addition to the same configuration as that of the first embodiment.

　学習部１８０は、本発明の「学習手段」の一例であり、学習用記憶部１８５に記憶されている画像から、人物の顔及び手の相対的な位置関係を学習する。学習用記憶部１８５には、例えば図１２に示すように、複数の画像が記憶されている。 The learning unit 180 is an example of the “learning unit” of the present invention, and learns the relative positional relationship between the face of the person and the hand from the image stored in the learning storage unit 185. The learning storage unit 185 stores a plurality of images, for example, as shown in FIG.

　尚、学習部１８０における学習は、映像入力部１１０において画像が取得される毎に行われてもよいし、予め所定の画像を用いて学習させておいても構わない。即ち、学習部１８０は、装置の動作時にリアルタイムで学習を行うようなものであってもよいし、予め行われた学習の結果を記憶しておくだけのものでも構わない。 The learning in the learning unit 180 may be performed every time an image is acquired in the video input unit 110, or may be learned in advance using a predetermined image. That is, the learning unit 180 may perform learning in real time during operation of the apparatus, or may simply store the results of learning performed in advance.

　図１３において、第３実施例に係る手存在範囲推定部１３０は、上述した第１実施例及び第２実施例とは異なり、学習部１８０における学習結果に基づいて手存在範囲を推定する。具体的には、図に示すように、顔２１０の位置に対して、手２２０の存在可能性が高い範囲Ｐ２が手存在範囲とされる。尚、図１３では、内側の円である程、手の存在可能性が高くなることを示している。手の検出処理は、より内側の円（即ちｍ手の存在可能性が高い領域）から順に行われる。 13, the hand presence range estimation unit 130 according to the third example estimates the hand presence range based on the learning result in the learning unit 180, unlike the first example and the second example described above. Specifically, as shown in the figure, a range P2 in which the possibility of the hand 220 being present is high with respect to the position of the face 210. In addition, in FIG. 13, it has shown that the possibility of presence of a hand becomes high, so that it is an inner circle | round | yen. The hand detection process is performed in order from the inner circle (that is, the region where m hands are highly likely to exist).

　学習結果に基づいて手存在範囲を推定すれば、例えば図６に示すように、顔２１０を中心とした円Ｐ１を手存在範囲と推定する場合と比べて、より適切な手存在範囲を推定できる。具体的には、手の存在確率が低い、図中の顔２１０の上側部分や右側部分を手存在範囲から除外できるため、手存在範囲をより狭い範囲として推定できる。よって、手を検出する処理の効率を更に高めることができる。 If the hand presence range is estimated based on the learning result, as shown in FIG. 6, for example, a more appropriate hand presence range can be estimated as compared to the case where the circle P1 centered on the face 210 is estimated as the hand presence range. . Specifically, since the upper portion and the right portion of the face 210 in the figure having a low hand presence probability can be excluded from the hand presence range, the hand presence range can be estimated as a narrower range. Therefore, the efficiency of processing for detecting a hand can be further increased.

　以上説明したように、第３実施例に係る映像処理装置によれば、学習結果に基づいて手存在範囲が推定される。従って、映像中の人物の手をより効率的に且つ精度よく検出することが可能である。 As described above, according to the video processing apparatus according to the third embodiment, the hand presence range is estimated based on the learning result. Therefore, it is possible to detect the hand of a person in the video more efficiently and accurately.

　本発明は、上述した実施形態に限られるものではなく、特許請求の範囲及び明細書全体から読み取れる発明の要旨或いは思想に反しない範囲で適宜変更可能であり、そのような変更を伴う映像処理装置及び方法もまた本発明の技術的範囲に含まれるものである。 The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the gist or concept of the invention that can be read from the claims and the entire specification. And methods are also within the scope of the present invention.

１１０　映像入力部
１２０　顔検出部
１３０　手存在範囲推定部
１４０　手検出部
１５０　出力部
１６０　肌色領域検出部
１７０　差分領域検出部
１７５　メモリ
１８０　学習部
１８５　学習用記憶部
２００　人物
２１０　顔
２２０　手 110 video input unit 120 face detection unit 130 hand presence range estimation unit 140 hand detection unit 150 output unit 160 skin color region detection unit 170 difference region detection unit 175 memory 180 learning unit 185 learning storage unit 200 person 210 face 220 hand

Claims

Face detection means for detecting a human face from an image;
Range estimation means for estimating a hand presence range in which the person's hand may exist based on the detected face information of the person;
An image processing apparatus comprising: hand detection means for performing processing for detecting the hand of the person in the hand presence range.

The video processing apparatus according to claim 1, wherein the range estimation means estimates a circle centered on the face of the person as the hand presence range.

Learning means for learning the relative positional relationship of the person's face and the person's hand;
The video processing apparatus according to claim 1, wherein the range estimation unit estimates the hand presence range based on the learned relative positional relationship.

The video processing apparatus according to claim 1, wherein the hand detection means includes scale estimation means for estimating a size of the person's hand based on information on the face of the person.

The video processing apparatus according to claim 1, wherein the hand detection means includes rotation angle estimation means for estimating a rotation angle of the person's hand based on information on the face of the person.

The hand detection means includes skin color area detection means for detecting a skin color area from the image, and detects the human hand in the skin color area in the hand presence range. The video processing apparatus according to claim 1 in the range.

The hand detection means includes difference area detection means for comparing the past image and the current image, and detecting a difference area in which a difference in luminance is equal to or greater than a predetermined value. The video processing apparatus according to claim 1, wherein a hand of the person is detected in a difference area.

A face detection process for detecting a human face from an image;
A range estimation step of estimating a hand presence range in which the person's hand may exist based on the detected face information of the person;
And a hand detecting step of performing a process of detecting the hand of the person in the hand presence range.