WO2012014590A1 - Video processing device and method - Google Patents
Video processing device and method Download PDFInfo
- Publication number
- WO2012014590A1 WO2012014590A1 PCT/JP2011/063701 JP2011063701W WO2012014590A1 WO 2012014590 A1 WO2012014590 A1 WO 2012014590A1 JP 2011063701 W JP2011063701 W JP 2011063701W WO 2012014590 A1 WO2012014590 A1 WO 2012014590A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hand
- person
- face
- range
- detecting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the present invention relates to a technical field of a video processing apparatus and method for detecting a predetermined object existing in a video by performing various processes on the video, for example.
- Non-Patent Document 1 proposes a technique of learning SVM (Support Vector Vector Machine) by calculating a histogram in a luminance gradient direction as a feature quantity and detecting a person from a video.
- SVM Serial Vector Vector Machine
- the present invention has been made in view of, for example, the above-described problems, and an object of the present invention is to provide a video processing apparatus and method capable of efficiently and accurately detecting a human hand in a video.
- the video processing apparatus of the present invention detects a person's face from an image, and presence of the person's hand that can exist based on the detected face information of the person Range estimation means for estimating the range, and hand detection means for performing processing for detecting the hand of the person in the hand presence range.
- the video processing method of the present invention detects a person's face from an image, and the presence of the person's hand that can exist based on the detected face information of the person A range estimation step of estimating a range; and a hand detection step of performing a process of detecting the hand of the person in the hand presence range.
- the video processing apparatus includes a face detection unit that detects a person's face from an image, and a range for estimating a hand presence range in which the person's hand can exist based on the detected face information of the person.
- An estimation unit and a hand detection unit that performs a process of detecting the hand of the person in the hand presence range.
- a human face is first detected from the images constituting the video. Specifically, for example, the entire image is scanned, and a portion having a predetermined value such that the condition such as luminance can be recognized as a human face is detected as a face.
- a hand presence range in which a human hand can exist is estimated based on the detected face information.
- face information refers to various information that can be read from the detected face, and includes, for example, the position, size, orientation, and the like of the face.
- the human hand cannot exist at a position very far from the face due to the structure of the human body. That is, if the position of one person's face is known, the hand of one person can exist only within the range where the arm extends from there. Therefore, if the face can be detected, the range in which the hand can exist can be estimated with high accuracy based on the face information.
- processing for detecting a human hand in the hand presence range is performed. That is, processing for detecting a person's hand is performed only in the hand presence range in the image.
- Examples of the process for detecting a hand include matching with a template image prepared in advance, a method using a luminance gradient direction histogram, a method using a boundary line shape of a skin color region, and the like.
- a detection method is merely an example, and the hand detection method according to the present embodiment is not limited at all.
- the above-described hand presence range is not estimated, it is required to perform a hand detection process on the entire image. In this case, the time required to detect a hand and the load due to the detection process increase due to the wide processing range.
- the process of detecting the hand is performed only in the hand presence range in the image. Therefore, it is possible to detect a hand very efficiently.
- the time until detection and the load due to the detection process are increased for the process of detecting the face. It is done.
- detection of human faces is technically mature and has already been put into practical use and can be used without increasing a new burden. For this reason, the influence of the face detection process is almost negligible.
- the process of detecting the hand only in the hand presence range is performed, for example, when the hand of another person exists outside the hand presence range, the hand of another person is detected. It can prevent detecting as a hand of the person who is the object. That is, it is possible to prevent erroneous detection and accurately detect a human hand.
- the video processing apparatus According to the video processing apparatus according to the present embodiment, it is possible to efficiently and accurately detect the hand of a person in the video.
- the range estimation means estimates a circle centered on the person's face as the hand presence range.
- the process for estimating the hand presence range can be made extremely simple.
- the “circle around the face of a person” here is not limited to a circle centered on a portion located completely in the center of the face, but various circles including a face near the center. Or it is a wide concept including an ellipse.
- the range in which a person's hand can exist is the range in which the person's arm extends. Then, it is considered that the portion serving as a fulcrum of the arm (specifically, near the shoulder) is located very close to the face. Therefore, for example, if a circle whose radius is the length of the person's arm is estimated as the hand presence range, the hand can be detected more efficiently.
- the length of the arm can be estimated from the size of the detected face, for example.
- the video processing apparatus further includes a learning unit that learns a relative positional relationship between the face of the person and the hand of the person, and the range estimation unit includes the learned relative The hand presence range is estimated based on the correct positional relationship.
- the relative positional relationship of the person's face and the person's hand is learned by referring to many images in advance.
- the “relative positional relationship” includes not only a simple positional relationship between a person's face and a person's hand, but also information regarding the size and orientation of the face and hand.
- the range in which the process of detecting the hand is performed becomes a more appropriate range. Therefore, it is possible to detect the hand very efficiently.
- processing is performed in order from a range in which the probability that a hand exists is high (that is, a range in which a human hand is detected with a high frequency in learning of a relative positional relationship). If this is executed, the hand can be detected more efficiently.
- the hand detection unit includes a scale estimation unit that estimates the size of the person's hand based on information about the person's face.
- the size of the hand is estimated based on the detected face information. For example, if the face is detected large, it is estimated that the detected hand will be large. On the other hand, if the face is detected small, it is estimated that the detected hand will be small.
- the process of detecting the hand can be performed extremely efficiently and accurately.
- the hand detection unit includes a rotation angle estimation unit that estimates a rotation angle of the person's hand based on information on the person's face.
- the rotation angle of the hand (that is, the hand inclination) is estimated based on the detected face information.
- the hand rotation angle is estimated to be within a predetermined range with respect to the detected face rotation angle. That is, the hand rotation angle can be estimated by utilizing the fact that the face rotation angle and the hand rotation angle are relatively close to each other.
- the process of detecting the hand can be performed very efficiently and accurately.
- the scribe detection means includes a skin color area detection means for detecting a skin color area from the image, and in the skin color area in the hand presence range, The person's hand is detected.
- the skin color region is detected from the image before the processing for detecting the hand is performed.
- the skin color region is detected based on, for example, a preset skin color component. Or you may detect according to the color of the detected face.
- a process for detecting the hand is performed on the skin color area in the hand presence range. That is, even in the hand presence range, the process for detecting the hand is not performed for the non-skin color area, and the process for detecting the hand is performed only in the skin color area in the hand presence range.
- the color of a person's hand is considered to be a skin color unless gloves are worn, for example. For this reason, if the area
- the hand detection unit compares the past image and the current image, and detects a difference region in which a difference in luminance is a predetermined value or more.
- a detection unit is included, and the person's hand is detected in the difference area in the hand presence range.
- the difference area is detected before the process of detecting the hand is performed.
- the difference area is an area where a difference in luminance is equal to or greater than a predetermined value by comparing a past image and a current image.
- the “predetermined value” is a value set according to the difference in luminance between the portion where the hand is present and the portion where the hand is not present.
- a process for detecting a hand is performed for the difference area in the hand presence range. That is, the process of detecting the hand is performed only in the difference area within the hand presence range. That is, even if it is the hand presence range, the processing for detecting the hand is not performed for the region that is not the difference region, and the processing for detecting the hand is performed only in the difference region within the hand presence range.
- a person's hand is considered to move much more than other objects in the image. For this reason, there is a high possibility that the position of the hand is different between the past image and the current image. Therefore, it is estimated that there is a high possibility that a hand is present in an area where the luminance difference is equal to or greater than a predetermined value (that is, an area where the luminance has greatly changed). For this reason, if the difference area is detected, the range for performing the hand detection process can be further narrowed. Therefore, in this aspect, it is possible to detect a hand very efficiently.
- the video processing method includes a face detection step for detecting a person's face from an image, and a range for estimating a hand presence range in which the person's hand may exist based on the detected face information of the person.
- the face of a person is detected, and in the hand presence range estimated based on the detected face information. Processing to detect the hand is performed. Therefore, it is possible to efficiently and accurately detect the hand of a person in the video.
- video processing method of the present invention can also adopt various aspects similar to the various aspects of the video processing apparatus of the present invention described above.
- FIG. 1 is a block diagram showing the configuration of the video processing apparatus according to the first embodiment.
- the video processing apparatus includes a video input unit 110, a face detection unit 120, a hand presence range estimation unit 130, a hand detection unit 140, and an output unit 150. ing.
- the video input unit 110 inputs a video to be detected from the outside of the apparatus to the video processing apparatus.
- the video is input as, for example, a plurality of continuous images.
- the face detection unit 120 is an example of the “face detection unit” of the present invention, and scans an input image to detect the face of a person in the image.
- the hand presence range estimation unit 130 is an example of the “range estimation unit” of the present invention, and based on the detected face information (for example, the position, size, orientation, etc. of the face) The range in which a person's hand can exist is estimated.
- the hand detection unit 140 is an example of the “hand detection unit” of the present invention, and performs a process of detecting a hand in the estimated hand presence range to detect a person's hand.
- the output unit 150 outputs information on the detected human hand.
- the hand information output here is, for example, an information terminal in a public space such as another device (for example, a television, an air conditioner, a lighting device, a personal computer, a game machine, a portable device, a ticket machine, or a digital signage) according to the behavior of the hand. , Housing equipment, automotive in-vehicle equipment, etc.).
- FIG. 2 is a flowchart showing the operation of the video processing apparatus according to the first embodiment.
- FIG. 3 is a plan view showing an example of the input image
- FIG. 4 is a plan view conceptually showing the face detection method.
- the video input unit 110 first acquires an image constituting the video (step S01).
- an image showing the face 210 and the hand 220 of the person 200 is acquired as shown in FIG.
- the face detection unit 120 detects the face 210 of the person 200 in the image (step S02). For example, the face detection unit 120 scans the entire acquired image. Then, a part having a predetermined value such that the condition such as luminance can be recognized as a human face is detected as a face.
- the face 210 is detected so as to draw a rectangle surrounding the face 210.
- the coordinates (x, y) of the upper left point of the rectangle surrounding the face 210 and the values of the width w and the height h of the face 210 are acquired.
- various information such as the orientation of the face may be detected.
- FIG. 5 is a flowchart showing the hand presence range estimation method
- FIG. 6 is a plan view conceptually showing the hand presence range estimation method
- FIG. 7 is a conceptual diagram showing a method for estimating the scale of the hand to be detected
- FIG. 8 is a conceptual diagram showing a method for estimating the rotation angle of the hand to be detected.
- the hand presence range estimation unit 130 estimates a circle centered on the detected face as the hand presence range (step S11). Specifically, a circle P1 as shown in FIG. 6 is estimated as the hand presence range.
- the human hand cannot exist at a position very far from the face due to the structure of the human body. That is, if the position of the person's face is known, the hand can only exist within the range where the arm extends from there. Therefore, for example, if a circle whose radius is the length of the person's arm is estimated as the hand presence range, the hand can be detected more efficiently.
- the radius of the circle P1 is determined according to the size of the detected face 210. More specifically, twice the height h of the face 210 is the radius of the circle P1.
- the scale of the hand that will be detected is estimated based on the size of the detected face 210 (step S12).
- the hand scale is estimated to be in the range of 0.7 to 1.5 times the height h of the detected face 210, for example.
- a five-step template ie, A to O is set.
- the rotation angle of the hand that will be detected is estimated (step S13).
- the rotation angle of the hand is estimated to be in the range of +45 degrees to ⁇ 45 degrees with respect to the detected rotation angle of the face 210, for example.
- a seven-step template ie, “i” is set.
- step S14 it is determined whether or not the processing for all the faces in the image (that is, the processing from step S11 to step S13) has been completed. If it is determined that the processes for all the faces have not been completed (step S14: NO), the processes from step S11 to step S13 are performed again for the faces for which the process has not been completed. On the other hand, if it is determined that the processing for all the faces has been completed (step S14: YES), a combination of a range for actually performing hand detection, a scale, and a rotation angle is set (step S16). That is, the parameters estimated in the processing from step S11 to step S13 are combined with each other, and a condition for actually detecting the hand is set.
- the hand detecting process is performed in the hand detecting unit 140 (step S04).
- the hand detection unit 140 detects the hand 220 in the image by matching with a template image prepared in advance (see FIGS. 7 and 8). Note that it is also possible to perform hand detection using a method using a luminance gradient direction histogram or a method using a boundary line shape of a skin color region.
- the above-described hand presence range (see FIG. 6) is not estimated, it is required to perform processing for detecting the hand 220 on the entire image. In this case, the time until the hand 220 is detected and the load due to the detection process increase due to the wide processing range.
- the process of detecting the hand 220 is performed only in the hand presence range in the image. Therefore, it is possible to detect a hand very efficiently. Furthermore, since the process of detecting the hand only in the hand presence range is performed, for example, when the hand of another person exists outside the hand presence range, the hand of the other person is detected. Can be prevented from being detected as a hand. That is, it is possible to prevent erroneous detection and accurately detect a human hand.
- the output unit 150 When the detection of the hand 220 is completed, the output unit 150 outputs the detected hand information as a result (step S05).
- the output unit 150 outputs, for example, the detected position, scale, and rotation angle of the hand 220. If a plurality of hands 220 are detected, the results are output for all hands 220.
- the hand presence range is estimated based on the detected face information. Therefore, it is possible to efficiently and accurately detect the hand of a person in the video.
- FIG. 9 is a block diagram showing the configuration of the video processing apparatus according to the second embodiment.
- FIG. 10 is a flowchart showing the operation of the video processing apparatus according to the second embodiment.
- the second embodiment differs from the first embodiment described above in part of the configuration and operation, and the other portions are substantially the same. For this reason, in the following description, a different part from 1st Example is described in detail, and description is abbreviate
- the video processing apparatus is configured to include a skin color area detection unit 160, a difference area detection unit 170, and a memory 175 in addition to the same configuration as that of the first embodiment. Yes.
- the skin color area detection unit 160 is an example of the “skin color area detection unit” of the present invention, and detects a skin color area in an image.
- the skin color region is detected based on, for example, a preset skin color component. Or you may detect according to the color of the detected face 210.
- FIG. 1 A preset skin color component. Or you may detect according to the color of the detected face 210.
- the difference area detection unit 170 is an example of the “difference area detection unit” of the present invention, and the difference in luminance between the past image (for example, the image one frame before) stored in the memory 175 and the current image is A difference area that is equal to or greater than a predetermined value is detected.
- the video input unit 110 acquires an image constituting the video (step S21).
- the face detection unit 120 detects the face 210 of the person 200 in the image (step S22). That is, the same processing as in the first embodiment is performed.
- the skin color area detection unit 160 detects a skin color area in the image (step S23). Further, the difference area detection unit 170 detects a difference area in the image using the past image stored in the memory 175 (step S24).
- the circle P1 centered on the detected face 210 is estimated as the hand presence range (step S25).
- the hand presence estimation range here, an area that is not a skin color area and an area that is not a difference area are excluded. That is, the hand presence range is within the circle P1 centering on the face 210, and is a skin color region and a difference region.
- the color of the hand 220 is considered to be a skin color unless, for example, gloves are used.
- the hand 220 in the image is considered to move more than other objects. For this reason, there is a high possibility that the position of the hand 200 is different between the past image and the current image. Therefore, it is estimated that there is a high possibility that the hand 220 is present in an area where the luminance difference is equal to or greater than a predetermined value (that is, an area where the luminance has greatly changed).
- a predetermined value that is, an area where the luminance has greatly changed.
- step S26 hand detection processing is performed in the hand detection unit 140 (step S26). Then, the output unit 150 outputs the detected hand information as a result (step S27).
- the hand presence range is estimated based on the skin color area and the difference area. Therefore, it is possible to detect the hand of a person in the video more efficiently and accurately.
- FIG. 11 is a block diagram showing the configuration of the video processing apparatus according to the third embodiment.
- FIG. 12 is a conceptual diagram showing a plurality of images used for learning
- FIG. 13 is a plan view conceptually showing a hand presence range estimated based on learning.
- the third embodiment differs from the first and second embodiments described above in part of the configuration and operation, and the other portions are substantially the same. For this reason, in the following description, a different part from 1st and 2nd Example is described in detail, and description is abbreviate
- the video processing apparatus includes a learning unit 180 and a learning storage unit 185 in addition to the same configuration as that of the first embodiment.
- the learning unit 180 is an example of the “learning unit” of the present invention, and learns the relative positional relationship between the face of the person and the hand from the image stored in the learning storage unit 185.
- the learning storage unit 185 stores a plurality of images, for example, as shown in FIG.
- the learning in the learning unit 180 may be performed every time an image is acquired in the video input unit 110, or may be learned in advance using a predetermined image. That is, the learning unit 180 may perform learning in real time during operation of the apparatus, or may simply store the results of learning performed in advance.
- the hand presence range estimation unit 130 estimates the hand presence range based on the learning result in the learning unit 180, unlike the first example and the second example described above. Specifically, as shown in the figure, a range P2 in which the possibility of the hand 220 being present is high with respect to the position of the face 210. In addition, in FIG. 13, it has shown that the possibility of presence of a hand becomes high, so that it is an inner circle
- the hand presence range is estimated based on the learning result, as shown in FIG. 6, for example, a more appropriate hand presence range can be estimated as compared to the case where the circle P1 centered on the face 210 is estimated as the hand presence range. . Specifically, since the upper portion and the right portion of the face 210 in the figure having a low hand presence probability can be excluded from the hand presence range, the hand presence range can be estimated as a narrower range. Therefore, the efficiency of processing for detecting a hand can be further increased.
- the hand presence range is estimated based on the learning result. Therefore, it is possible to detect the hand of a person in the video more efficiently and accurately.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
本発明は、例えば映像に対して各種処理を施すことにより、映像中に存在する所定のオブジェクトを検出する映像処理装置及び方法の技術分野に関する。 The present invention relates to a technical field of a video processing apparatus and method for detecting a predetermined object existing in a video by performing various processes on the video, for example.
この種の映像処理装置として、例えば撮影された映像の中から人物等の所定のオブジェクトを検出するものがある。オブジェクトを検出する際には、検出対象であるオブジェクトに共通する局所的な画像特徴が利用される。例えば非特許文献1では、輝度の勾配方向のヒストグラムを特徴量として算出することでSVM(Support Vector Machine)を学習し、映像中から人物を検出するという技術が提案されている。 As this type of video processing device, for example, there is a device that detects a predetermined object such as a person from a captured video. When detecting an object, local image features common to the object to be detected are used. For example, Non-Patent Document 1 proposes a technique of learning SVM (Support Vector Vector Machine) by calculating a histogram in a luminance gradient direction as a feature quantity and detecting a person from a video.
しかしながら、上述したような技術では、オブジェクトを検出する際に、先ず画像全体をスキャンすることが求められる。このため、オブジェクトを検出するまでの時間や、検出処理による負荷の増大が避けられない。このような不都合は、例えば人物の顔のように、検出し易い(即ち、特徴が明確である)オブジェクトを検出する場合には、比較的少なく抑えることができる。一方で、例えば人物の手のような、検出し難い(例えば、検出に用いることができる特徴が少ない、或いは形状が一定ではなく状況によって大きく変化してしまう)オブジェクトを検出する場合には、比較的大きな問題となってしまう。即ち、上述した技術には、様々なオブジェクトを効率よく検出することが困難であるという技術的問題点がある。 However, in the technique as described above, when an object is detected, it is required to first scan the entire image. For this reason, an increase in the load due to the time until the object is detected and the detection process is unavoidable. Such an inconvenience can be suppressed to a relatively small degree when detecting an object that is easy to detect (that is, a feature is clear) such as a human face. On the other hand, when detecting objects such as human hands that are difficult to detect (for example, there are few features that can be used for detection or the shape is not constant and changes greatly depending on the situation) It becomes a big problem. That is, the above-described technique has a technical problem that it is difficult to efficiently detect various objects.
本発明は、例えば上述した問題点に鑑みなされたものであり、映像中の人物の手を効率的に且つ精度よく検出することが可能な映像処理装置及び方法を提供することを課題とする。 The present invention has been made in view of, for example, the above-described problems, and an object of the present invention is to provide a video processing apparatus and method capable of efficiently and accurately detecting a human hand in a video.
本発明の映像処理装置は上記課題を解決するために、画像から人物の顔を検出する顔検出手段と、検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定手段と、前記手存在範囲において、前記人物の手を検出する処理を行う手検出手段とを備える。 In order to solve the above-described problem, the video processing apparatus of the present invention detects a person's face from an image, and presence of the person's hand that can exist based on the detected face information of the person Range estimation means for estimating the range, and hand detection means for performing processing for detecting the hand of the person in the hand presence range.
本発明の映像処理方法は上記課題を解決するために、画像から人物の顔を検出する顔検出工程と、検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定工程と、前記手存在範囲において、前記人物の手を検出する処理を行う手検出工程とを備える。 In order to solve the above-described problem, the video processing method of the present invention detects a person's face from an image, and the presence of the person's hand that can exist based on the detected face information of the person A range estimation step of estimating a range; and a hand detection step of performing a process of detecting the hand of the person in the hand presence range.
本発明の作用及び利得は次に説明する発明を実施するための形態から明らかにされる。 The operation and gain of the present invention will be clarified from embodiments for carrying out the invention described below.
本実施形態に係る映像処理装置は、画像から人物の顔を検出する顔検出手段と、検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定手段と、前記手存在範囲において、前記人物の手を検出する処理を行う手検出手段とを備える。 The video processing apparatus according to the present embodiment includes a face detection unit that detects a person's face from an image, and a range for estimating a hand presence range in which the person's hand can exist based on the detected face information of the person. An estimation unit; and a hand detection unit that performs a process of detecting the hand of the person in the hand presence range.
本実施形態に係る映像処理装置によれば、その動作時に、先ず映像を構成する画像から人物の顔が検出される。具体的には、例えば画像全体がスキャンされ、輝度等の条件が、人間の顔であると認識できるような所定の値となっている部分が顔として検出される。 According to the video processing apparatus according to the present embodiment, during the operation, a human face is first detected from the images constituting the video. Specifically, for example, the entire image is scanned, and a portion having a predetermined value such that the condition such as luminance can be recognized as a human face is detected as a face.
続いて本実施形態では、検出された顔の情報に基づいて、人物の手が存在し得る手存在範囲が推定される。尚、ここでの「顔の情報」とは、検出された顔から読み取れる様々な情報を指しており、例えば顔の位置や大きさ、向き等が挙げられる。 Subsequently, in this embodiment, a hand presence range in which a human hand can exist is estimated based on the detected face information. Here, “face information” refers to various information that can be read from the detected face, and includes, for example, the position, size, orientation, and the like of the face.
人間の手は、人体の構造上、顔から極めて遠く離れた位置には存在し得ない。即ち、一の人物の顔の位置が分かっているとすると、そこから腕が伸びる範囲にしか、一の人物の手は存在し得ない。従って、顔を検出することができれば、顔の情報に基づいて、手が存在し得る範囲を高い精度で推定することができる。 The human hand cannot exist at a position very far from the face due to the structure of the human body. That is, if the position of one person's face is known, the hand of one person can exist only within the range where the arm extends from there. Therefore, if the face can be detected, the range in which the hand can exist can be estimated with high accuracy based on the face information.
手存在範囲が推定されると、手存在範囲において人物の手を検出する処理が行われる。即ち、画像中の手存在範囲においてのみ、人物の手を検出する処理が行われる。手を検出する処理としては、例えば予め用意されたテンプレート画像とのマッチングや、輝度勾配方向ヒストグラムを用いた方法、或いは肌色領域の境界線形状を利用した方法等が挙げられる。但し、このような検出方法はあくまで一例に過ぎず、本実施形態に係る手の検出方法については、何ら限定されるものではない。 When the hand presence range is estimated, processing for detecting a human hand in the hand presence range is performed. That is, processing for detecting a person's hand is performed only in the hand presence range in the image. Examples of the process for detecting a hand include matching with a template image prepared in advance, a method using a luminance gradient direction histogram, a method using a boundary line shape of a skin color region, and the like. However, such a detection method is merely an example, and the hand detection method according to the present embodiment is not limited at all.
ここで仮に、上述した手存在範囲が推定されていないとすると、手を検出する処理を画像全体において行うことが求められてしまう。この場合、処理を行う範囲が広い分、手を検出するまでの時間や検出処理による負荷が増加してしまう。 Here, if the above-described hand presence range is not estimated, it is required to perform a hand detection process on the entire image. In this case, the time required to detect a hand and the load due to the detection process increase due to the wide processing range.
しかるに本実施形態に係る映像処理装置では、上述したように、手を検出する処理は画像における手存在範囲においてのみ行われる。従って、極めて効率的に手を検出することが可能である。尚、本実施形態では、顔を検出せずに直接手を検出しようとする場合と比べると、顔を検出する処理の分、検出するまでの時間や検出処理による負荷が増加してしまうと考えられる。しかしながら、人物の顔の検出は、技術的に成熟しており、既に実用化され普及しているため、新たな負担を増やすことなく利用できる。このため、顔を検出する処理による影響は、ほとんど無視できるまでに小さい。 However, in the video processing apparatus according to the present embodiment, as described above, the process of detecting the hand is performed only in the hand presence range in the image. Therefore, it is possible to detect a hand very efficiently. In the present embodiment, compared to the case where a hand is directly detected without detecting a face, the time until detection and the load due to the detection process are increased for the process of detecting the face. It is done. However, detection of human faces is technically mature and has already been put into practical use and can be used without increasing a new burden. For this reason, the influence of the face detection process is almost negligible.
本実施形態では更に、手存在範囲においてのみ手を検出する処理が行われるため、例えば手存在範囲の外に他の人物の手が存在していた場合等において、他の人物の手を、検出対象である人物の手として検出してしまうことを防止できる。即ち、誤検出を防止し、精度よく人物の手を検出することができる。 Further, in the present embodiment, since the process of detecting the hand only in the hand presence range is performed, for example, when the hand of another person exists outside the hand presence range, the hand of another person is detected. It can prevent detecting as a hand of the person who is the object. That is, it is possible to prevent erroneous detection and accurately detect a human hand.
以上説明したように、本実施形態に係る映像処理装置によれば、映像中の人物の手を効率的に且つ精度よく検出することが可能である。 As described above, according to the video processing apparatus according to the present embodiment, it is possible to efficiently and accurately detect the hand of a person in the video.
本実施形態に係る映像処理装置の一態様では、前記範囲推定手段は、前記人物の顔を中心とする円を、前記手存在範囲として推定する。 In one aspect of the video processing apparatus according to the present embodiment, the range estimation means estimates a circle centered on the person's face as the hand presence range.
この態様によれば、手存在範囲が人物の顔を中心とする円として推定されるため、手存在範囲を推定するための処理を、極めて簡単なものとすることができる。尚、ここでの「人物の顔を中心とする円」とは、顔の完全に中央に位置する部分を中心とする円に限られるものではなく、中心付近に顔を含むような様々な円或いは楕円を含む広い概念である。 According to this aspect, since the hand presence range is estimated as a circle centered on the person's face, the process for estimating the hand presence range can be made extremely simple. The “circle around the face of a person” here is not limited to a circle centered on a portion located completely in the center of the face, but various circles including a face near the center. Or it is a wide concept including an ellipse.
人物の手が存在し得る範囲は、その人物の腕が伸びる範囲である。そして、腕の支点となる部分(具体的には、肩付近)は、極めて顔に近い位置にあると考えられる。従って、例えば人物の腕の長さを半径とする円を手存在範囲として推定すれば、より効率的に手の検出を行うことができる。尚、腕の長さは、例えば検出された顔の大きさ等から推定できる。 The range in which a person's hand can exist is the range in which the person's arm extends. Then, it is considered that the portion serving as a fulcrum of the arm (specifically, near the shoulder) is located very close to the face. Therefore, for example, if a circle whose radius is the length of the person's arm is estimated as the hand presence range, the hand can be detected more efficiently. The length of the arm can be estimated from the size of the detected face, for example.
本実施形態に係る映像処理装置の他の態様では、前記人物の顔及び前記人物の手の相対的な位置関係を学習する学習手段を更に備え、前記範囲推定手段は、学習された前記相対的な位置関係に基づいて、前記手存在範囲を推定する。 In another aspect of the video processing apparatus according to the present embodiment, the video processing apparatus further includes a learning unit that learns a relative positional relationship between the face of the person and the hand of the person, and the range estimation unit includes the learned relative The hand presence range is estimated based on the correct positional relationship.
この態様によれば、手存在範囲を推定する前に、予め多くの画像を参照することにより、人物の顔及び人物の手の相対的な位置関係が学習される。尚、ここでの「相対的な位置関係」とは、人物の顔及び人物の手の単純な位置関係だけでなく、顔及び手の大きさや向きに関する情報も含んでいる。 According to this aspect, before estimating the hand presence range, the relative positional relationship of the person's face and the person's hand is learned by referring to many images in advance. Here, the “relative positional relationship” includes not only a simple positional relationship between a person's face and a person's hand, but also information regarding the size and orientation of the face and hand.
上述した学習結果に基づいて手存在範囲を推定するようにすれば、手を検出する処理が行われる範囲が、より適切な範囲となる。従って、極めて効率的に手の検出を行うことが可能となる。また、手を検出する処理を行う際に、手が存在する確率が高いと考えられる範囲(即ち、相対的な位置関係の学習において、人物の手が高い頻度で検出された範囲)から順に処理を実行するようにすれば、より効率的に手の検出を行うことができる。 If the hand presence range is estimated based on the learning result described above, the range in which the process of detecting the hand is performed becomes a more appropriate range. Therefore, it is possible to detect the hand very efficiently. In addition, when performing a process of detecting a hand, processing is performed in order from a range in which the probability that a hand exists is high (that is, a range in which a human hand is detected with a high frequency in learning of a relative positional relationship). If this is executed, the hand can be detected more efficiently.
本実施形態に係る映像処理装置の他の態様では、前記手検出手段は、前記人物の顔の情報に基づいて、前記人物の手の大きさを推定するスケール推定手段を有する。 In another aspect of the video processing apparatus according to the present embodiment, the hand detection unit includes a scale estimation unit that estimates the size of the person's hand based on information about the person's face.
この態様によれば、顔が検出されると、検出された顔の情報に基づいて手の大きさが推定される。例えば、顔が大きく検出された場合には、検出される手も大きいであろうと推定される。他方で、顔が小さく検出された場合には、検出される手も小さいであろうと推定される。 According to this aspect, when a face is detected, the size of the hand is estimated based on the detected face information. For example, if the face is detected large, it is estimated that the detected hand will be large. On the other hand, if the face is detected small, it is estimated that the detected hand will be small.
本態様では、検出されるであろう手の大きさが予め推定されているため、手を検出する処理を極めて効率的に且つ精度よく行うことができる。 In this aspect, since the size of the hand that will be detected is estimated in advance, the process of detecting the hand can be performed extremely efficiently and accurately.
本実施形態に係る映像処理装置の他の態様では、前記手検出手段は、前記人物の顔の情報に基づいて、前記人物の手の回転角を推定する回転角推定手段を有する。 In another aspect of the video processing apparatus according to the present embodiment, the hand detection unit includes a rotation angle estimation unit that estimates a rotation angle of the person's hand based on information on the person's face.
この態様によれば、顔が検出されると、検出された顔の情報に基づいて手の回転角(即ち、手の傾き)が推定される。例えば、手の回転角は、検出された顔の回転角に対して所定の範囲内であると推定される。即ち、顔の回転角及び手の回転角が比較的近い値となることを利用して、手の回転角を推定できる。 According to this aspect, when a face is detected, the rotation angle of the hand (that is, the hand inclination) is estimated based on the detected face information. For example, the hand rotation angle is estimated to be within a predetermined range with respect to the detected face rotation angle. That is, the hand rotation angle can be estimated by utilizing the fact that the face rotation angle and the hand rotation angle are relatively close to each other.
本態様では、検出されるであろう手の回転角が予め推定されているため、手を検出する処理を極めて効率的に且つ精度よく行うことができる。 In this aspect, since the rotation angle of the hand that will be detected is estimated in advance, the process of detecting the hand can be performed very efficiently and accurately.
本実施形態に係る映像処理装置の他の態様では、記手検出手段は、前記画像から肌色の領域を検出する肌色領域検出手段を有しており、前記手存在範囲における前記肌色の領域において、前記人物の手を検出する。 In another aspect of the video processing apparatus according to the present embodiment, the scribe detection means includes a skin color area detection means for detecting a skin color area from the image, and in the skin color area in the hand presence range, The person's hand is detected.
この態様によれば、手存在範囲が推定されると、手を検出する処理が行われる前に、画像から肌色の領域が検出される。肌色の領域は、例えば予め設定された肌色の色成分に基づいて検出される。或いは、検出された顔の色に応じて検出されてもよい。 According to this aspect, when the hand presence range is estimated, the skin color region is detected from the image before the processing for detecting the hand is performed. The skin color region is detected based on, for example, a preset skin color component. Or you may detect according to the color of the detected face.
肌色の領域が検出されると、手存在範囲における肌色の領域について、手を検出する処理が行われる。即ち、手存在範囲であっても、肌色でない領域については手を検出する処理は行われず、手存在範囲の中の肌色の領域においてのみ、手を検出する処理が行われる。 When the skin color area is detected, a process for detecting the hand is performed on the skin color area in the hand presence range. That is, even in the hand presence range, the process for detecting the hand is not performed for the non-skin color area, and the process for detecting the hand is performed only in the skin color area in the hand presence range.
人物の手の色は、例えば手袋等をしない限り肌色であると考えられる。このため、肌色の領域を検出しておけば、手を検出する処理を行う範囲をより狭めることができる。従って、本態様では、極めて効率的に手の検出を行うことが可能である。 The color of a person's hand is considered to be a skin color unless gloves are worn, for example. For this reason, if the area | region of a skin color is detected, the range which performs the process which detects a hand can be narrowed more. Therefore, in this aspect, it is possible to detect a hand very efficiently.
本実施形態に係る映像処理装置の他の態様では、前記手検出手段は、過去の前記画像及び現在の前記画像を比較して、輝度の差分が所定値以上となる差分領域を検出する差分領域検出手段を有しており、前記手存在範囲における前記差分領域において、前記人物の手を検出する。 In another aspect of the video processing apparatus according to the present embodiment, the hand detection unit compares the past image and the current image, and detects a difference region in which a difference in luminance is a predetermined value or more. A detection unit is included, and the person's hand is detected in the difference area in the hand presence range.
この態様によれば、手存在範囲が推定されると、手を検出する処理が行われる前に差分領域が検出される。差分領域は、過去の画像及び現在の画像を比較して、輝度の差分が所定値以上となる領域である。尚、ここでの「所定値」とは、手が存在している部分と存在していない部分との輝度の差に応じて設定される値である。 According to this aspect, when the hand presence range is estimated, the difference area is detected before the process of detecting the hand is performed. The difference area is an area where a difference in luminance is equal to or greater than a predetermined value by comparing a past image and a current image. Here, the “predetermined value” is a value set according to the difference in luminance between the portion where the hand is present and the portion where the hand is not present.
差分領域が検出されると、手存在範囲における差分領域について、手を検出する処理が行われる。即ち、手存在範囲の中の差分領域においてのみ、手を検出する処理が行われる。即ち、手存在範囲であっても、差分領域でない領域については手を検出する処理は行われず、手存在範囲の中の差分領域においてのみ、手を検出する処理が行われる。 When the difference area is detected, a process for detecting a hand is performed for the difference area in the hand presence range. That is, the process of detecting the hand is performed only in the difference area within the hand presence range. That is, even if it is the hand presence range, the processing for detecting the hand is not performed for the region that is not the difference region, and the processing for detecting the hand is performed only in the difference region within the hand presence range.
人物の手は、画像中の他の物体と比べると、動きが大きいと考えられる。このため、過去の画像及び現在の画像では、手の位置が異なっている可能性が高い。よって、輝度の差分が所定値以上となる領域(即ち、輝度が大きく変化した領域)に、手が存在している可能性が高いと推定される。このため、差分領域を検出しておけば、手を検出する処理を行う範囲をより狭めることができる。従って、本態様では、極めて効率的に手の検出を行うことが可能である。 A person's hand is considered to move much more than other objects in the image. For this reason, there is a high possibility that the position of the hand is different between the past image and the current image. Therefore, it is estimated that there is a high possibility that a hand is present in an area where the luminance difference is equal to or greater than a predetermined value (that is, an area where the luminance has greatly changed). For this reason, if the difference area is detected, the range for performing the hand detection process can be further narrowed. Therefore, in this aspect, it is possible to detect a hand very efficiently.
本実施形態に係る映像処理方法は、画像から人物の顔を検出する顔検出工程と、検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定工程と、前記手存在範囲において、前記人物の手を検出する処理を行う手検出工程とを備える。 The video processing method according to the present embodiment includes a face detection step for detecting a person's face from an image, and a range for estimating a hand presence range in which the person's hand may exist based on the detected face information of the person. An estimation step, and a hand detection step for performing processing for detecting the hand of the person in the hand presence range.
本実施形態に係る映像処理方法によれば、上述した本発明の映像処理装置の場合と同様に、先ず人物の顔が検出され、検出された顔の情報に基づいて推定された手存在範囲において手を検出する処理が行われる。従って、映像中の人物の手を効率的に且つ精度よく検出することが可能である。 According to the video processing method according to the present embodiment, as in the case of the video processing device of the present invention described above, first, the face of a person is detected, and in the hand presence range estimated based on the detected face information. Processing to detect the hand is performed. Therefore, it is possible to efficiently and accurately detect the hand of a person in the video.
尚、本発明の映像処理方法においても、上述した本発明の映像処理装置における各種態様と同様の各種態様を採ることが可能である。 Note that the video processing method of the present invention can also adopt various aspects similar to the various aspects of the video processing apparatus of the present invention described above.
本発明の映像処理装置及び方法に係る各種実施形態の作用及び他の利得については、以下に示す実施例において、より詳細に説明する。 The operation and other gains of various embodiments according to the video processing apparatus and method of the present invention will be described in more detail in the following examples.
以下では、図面を参照して、本発明の実施例について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
<第1実施例>
先ず、第1実施例に係る映像処理装置の構成について、図1を参照して説明する。ここに図1は、第1実施例に係る映像処理装置の構成を示すブロック図である。
<First embodiment>
First, the configuration of the video processing apparatus according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of the video processing apparatus according to the first embodiment.
図1において、第1実施例に係る映像処理装置は、映像入力部110と、顔検出部120と、手存在範囲推定部130と、手検出部140と、出力部150とを備えて構成されている。
1, the video processing apparatus according to the first embodiment includes a
映像入力部110は、映像処理装置に対して、検出対象となる映像を装置外部から入力する。映像は、例えば複数の連続する画像として入力される。
The
顔検出部120は、本発明の「顔検出手段」の一例であり、入力された画像をスキャンすることで、画像中に写っている人物の顔を検出する。
The
手存在範囲推定部130は、本発明の「範囲推定手段」の一例であり、検出された顔の情報(例えば、顔の位置や大きさ、向き等)に基づいて、手存在範囲(即ち、人物の手が存在し得る範囲)を推定する。
The hand presence
手検出部140は、本発明の「手検出手段」の一例であり、推定された手存在範囲において手を検出する処理を行い、人物の手を検出する。
The
出力部150は、検出された人物の手の情報を出力する。ここで出力される手の情報は、例えば手の挙動に応じた他の装置(例えば、テレビ、エアコン、照明器具、パソコン、ゲーム機、携帯機器、券売機、デジタルサイネージ等の公共空間の情報端末、住宅設備、自動車の車載機器等)の操作等に用いられる。
The
次に、第1実施例に係る映像処理装置の動作について、図1に加えて図2から図8を参照して説明する。ここに図2は、第1実施例に係る映像処理装置の動作を示すフローチャートである。また図3は、入力画像の一例を示す平面図であり、図4は、顔検出方法を概念的に示す平面図である。 Next, the operation of the video processing apparatus according to the first embodiment will be described with reference to FIGS. 2 to 8 in addition to FIG. FIG. 2 is a flowchart showing the operation of the video processing apparatus according to the first embodiment. FIG. 3 is a plan view showing an example of the input image, and FIG. 4 is a plan view conceptually showing the face detection method.
図2において、第1実施例に係る映像処理装置の動作時には、先ず映像入力部110が、映像を構成する画像を取得する(ステップS01)。以下では、図3に示すような、人物200の顔210及び手220が写っている画像が取得されたとして説明を進める。
In FIG. 2, during the operation of the video processing apparatus according to the first embodiment, the
画像が取得されると、顔検出部120は、画像中の人物200の顔210を検出する(ステップS02)。例えば顔検出部120は、取得された画像全体をスキャンする。そして、輝度等の条件が、人間の顔であると認識できるような所定の値となっている部分を顔として検出する。
When the image is acquired, the
図4において、顔210の検出は、顔210を囲むような矩形を描くように行われる。この際、顔210を囲む矩形の左上の点の座標(x,y)、並びに顔210の幅w及び高さhの値がそれぞれ取得される。尚、このようなパラメータ以外に、例えば顔の向き等の様々な情報が検出されてもよい。
4, the
図2に戻り、顔210が検出されると、検出された顔210の情報に基づいて、手存在範囲が推定される。以下では、手存在範囲の推定方法について、図5から図8を用いて詳細に説明する。ここに図5は、手存在範囲の推定方法を示すフローチャートであり、図6は、手存在範囲の推定方法を概念的に示す平面図である。また図7は、検出しようとする手のスケール推定方法を示す概念図であり、図8は、検出しようとする手の回転角推定方法を示す概念図である。
2, when the
図5において、手存在範囲推定部130は、検出された顔を中心とした円を、手存在範囲と推定する(ステップS11)。具体的には、図6に示すような円P1が手存在範囲として推定される。
In FIG. 5, the hand presence
人間の手は、人体の構造上、顔から極めて遠く離れた位置には存在し得ない。即ち、人物の顔の位置が分かっているとすると、そこから腕が伸びる範囲にしか手は存在し得ない。従って、例えば人物の腕の長さを半径とする円を手存在範囲として推定すれば、より効率的に手の検出を行うことができる。尚、ここでの円P1の半径は、検出された顔210の大きさに応じて決定されている。より具体的には、顔210の高さhの2倍を円P1の半径としている。
The human hand cannot exist at a position very far from the face due to the structure of the human body. That is, if the position of the person's face is known, the hand can only exist within the range where the arm extends from there. Therefore, for example, if a circle whose radius is the length of the person's arm is estimated as the hand presence range, the hand can be detected more efficiently. Here, the radius of the circle P1 is determined according to the size of the detected
手存在範囲が推定されると、検出された顔210の大きさに基づいて、検出されるであろう手のスケールを推定する(ステップS12)。手のスケールは、例えば検出された顔210の高さhの0.7倍~1.5倍の範囲であると推定される。手のスケールが推定されると、例えば図7に示すように、5段階のテンプレート(即ち、ア~オ)が設定される。
When the hand presence range is estimated, the scale of the hand that will be detected is estimated based on the size of the detected face 210 (step S12). The hand scale is estimated to be in the range of 0.7 to 1.5 times the height h of the detected
続いて、検出された顔210の回転角に基づいて、検出されるであろう手の回転角を推定する(ステップS13)。手の回転角は、例えば検出された顔210の回転角に対して、+45度~-45度の範囲であると推定される。手の回転角が推定されると、例えば図8に示すように、7段階のテンプレート(即ち、い~と)が設定される。
Subsequently, based on the detected rotation angle of the
手のスケール及び回転角が推定されると、画像中の全ての顔についての処理(即ち、ステップS11からステップS13までの処理)が終了したか否かを判定する(ステップS14)。ここで、全ての顔についての処理が終了していないと判定されると(ステップS14:NO)、処理が終了していない顔について、ステップS11からステップS13の処理が再び行われる。一方で、全ての顔についての処理が終了したと判定されると(ステップS14:YES)、実際に手検出を行う範囲、スケール、回転角の組み合わせを設定する(ステップS16)。即ち、ステップS11からステップS13の処理において推定されたパラメータの各々が互いに組み合わせられ、実際に手の検出を行う条件が設定される。 When the hand scale and the rotation angle are estimated, it is determined whether or not the processing for all the faces in the image (that is, the processing from step S11 to step S13) has been completed (step S14). If it is determined that the processes for all the faces have not been completed (step S14: NO), the processes from step S11 to step S13 are performed again for the faces for which the process has not been completed. On the other hand, if it is determined that the processing for all the faces has been completed (step S14: YES), a combination of a range for actually performing hand detection, a scale, and a rotation angle is set (step S16). That is, the parameters estimated in the processing from step S11 to step S13 are combined with each other, and a condition for actually detecting the hand is set.
図2に戻り、以上のように手の検出を行う条件が設定されると、手検出部140において手の検出処理が行われる(ステップS04)。手検出部140は、予め用意されたテンプレート画像(図7及び図8参照)とのマッチングによって、画像中の手220を検出する。尚、輝度勾配方向ヒストグラムを用いた方法、或いは肌色領域の境界線形状を利用した方法等を用いて、手の検出を行うことも可能である。
Returning to FIG. 2, when the condition for detecting the hand is set as described above, the hand detecting process is performed in the hand detecting unit 140 (step S04). The
ここで仮に、上述した手存在範囲(図6参照)が推定されていないとすると、手220を検出する処理を画像全体において行うことが求められてしまう。この場合、処理を行う範囲が広い分、手220を検出するまでの時間や検出処理による負荷が増加してしまう。
Here, if the above-described hand presence range (see FIG. 6) is not estimated, it is required to perform processing for detecting the
これに対し、本実施例に係る映像処理装置では、手220を検出する処理は、画像における手存在範囲においてのみ行われる。従って、極めて効率的に手を検出することが可能である。更に、手存在範囲においてのみ手を検出する処理が行われるため、例えば手存在範囲の外に他の人物の手が存在していた場合等において、他の人物の手を、検出対象である人物の手として検出してしまうことを防止できる。即ち、誤検出を防止し、精度よく人物の手を検出することができる。
On the other hand, in the video processing apparatus according to the present embodiment, the process of detecting the
手220の検出が終了すると、出力部150は、検出した手の情報を結果として出力する(ステップS05)。出力部150は、例えば検出した手220の位置、スケール、回転角を出力する。尚、手220が複数検出された場合には、全ての手220について結果を出力する。
When the detection of the
以上説明したように、第1実施例に係る映像処理装置によれば、検出された顔の情報に基づいて手存在範囲が推定される。従って、映像中の人物の手を効率的に且つ精度よく検出することが可能である。 As described above, according to the video processing apparatus according to the first embodiment, the hand presence range is estimated based on the detected face information. Therefore, it is possible to efficiently and accurately detect the hand of a person in the video.
<第2実施例>
次に、第2実施形態に係る映像処理装置について、図9及び図10を参照して説明する。ここに図9は、第2実施例に係る映像処理装置の構成を示すブロック図である。また図10は、第2実施例に係る映像処理装置の動作を示すフローチャートである。尚、第2実施例は、上述の第1実施例と比べて、一部の構成及び動作が異なり、その他の部分については概ね同様である。このため、以下の説明では、第1実施例と異なる部分について詳細に述べ、その他の重複する部分については適宜説明を省略するものとする。
<Second embodiment>
Next, a video processing apparatus according to the second embodiment will be described with reference to FIGS. FIG. 9 is a block diagram showing the configuration of the video processing apparatus according to the second embodiment. FIG. 10 is a flowchart showing the operation of the video processing apparatus according to the second embodiment. The second embodiment differs from the first embodiment described above in part of the configuration and operation, and the other portions are substantially the same. For this reason, in the following description, a different part from 1st Example is described in detail, and description is abbreviate | omitted suitably about another overlapping part.
図9において、第2実施例に係る映像処理装置は、第1実施例と同様の構成に加えて、肌色領域検出部160と、差分領域検出部170と、メモリ175とを備えて構成されている。
In FIG. 9, the video processing apparatus according to the second embodiment is configured to include a skin color
肌色領域検出部160は、本発明の「肌色領域検出手段」の一例であり、画像中の肌色の領域を検出する。肌色の領域は、例えば予め設定された肌色の色成分に基づいて検出される。或いは、検出された顔210の色に応じて検出されてもよい。
The skin color
差分領域検出部170は、本発明の「差分領域検出手段」の一例であり、メモリ175に記憶された過去の画像(例えば、1フレーム前の画像)と現在の画像との輝度の差分が、所定値以上となる差分領域を検出する。
The difference
図10において、第2実施例に係る映像処理装置の動作時には、先ず映像入力部110が、映像を構成する画像を取得する(ステップS21)。画像が取得されると、顔検出部120は、画像中の人物200の顔210を検出する(ステップS22)。即ち、第1実施例と同様の処理が行われる。
In FIG. 10, during the operation of the video processing apparatus according to the second embodiment, first, the
ここで第2実施例では特に、肌色領域検出部160が、画像中の肌色領域を検出する(ステップS23)。また、差分領域検出部170が、メモリ175に記憶された過去の画像を用いて、画像中の差分領域を検出する(ステップS24)。
Here, particularly in the second embodiment, the skin color
肌色領域及び差分領域が検出されると、図6に示したように、検出された顔210を中心とした円P1が手存在範囲として推定される(ステップS25)。但し、ここでの手存在推定範囲からは、肌色領域でない領域と、差分領域でない領域が除かれる。即ち、手存在範囲は、顔210を中心とした円P1の範囲内であって、肌色領域且つ差分領域である領域となる。
When the skin color area and the difference area are detected, as shown in FIG. 6, the circle P1 centered on the detected
手220の色は、例えば手袋等をしない限り肌色であると考えられる。また画像中の手220は、他の物体と比べると動きが大きいと考えられる。このため、過去の画像及び現在の画像では、手200の位置が異なっている可能性が高い。よって、輝度の差分が所定値以上となる領域(即ち、輝度が大きく変化した領域)に、手220が存在している可能性が高いと推定される。以上の結果、肌色領域及び差分領域を検出しておけば、手を検出する処理を行う範囲(即ち、手存在範囲)をより狭めることができる。
The color of the
手存在範囲が推定されると、手検出部140において手の検出処理が行われる(ステップS26)。そして、出力部150は、検出した手の情報を結果として出力する(ステップS27)。
When the hand presence range is estimated, hand detection processing is performed in the hand detection unit 140 (step S26). Then, the
以上説明したように、第2実施例に係る映像処理装置によれば、肌色領域及び差分領域に基づいて手存在範囲が推定される。従って、映像中の人物の手をより効率的に且つ精度よく検出することが可能である。 As described above, according to the video processing apparatus according to the second embodiment, the hand presence range is estimated based on the skin color area and the difference area. Therefore, it is possible to detect the hand of a person in the video more efficiently and accurately.
<第3実施例>
次に、第3実施形態に係る映像処理装置について、図11から図13を参照して説明する。ここに図11は、第3実施例に係る映像処理装置の構成を示すブロック図である。また図12は、学習に用いる複数の画像を示す概念図であり、図13は、学習に基づいて推定される手存在範囲を概念的に示す平面図である。尚、第3実施例は、上述の第1及び第2実施例と比べて、一部の構成及び動作が異なり、その他の部分については概ね同様である。このため、以下の説明では、第1及び第2実施例と異なる部分について詳細に述べ、その他の重複する部分については適宜説明を省略するものとする。
<Third embodiment>
Next, a video processing apparatus according to the third embodiment will be described with reference to FIGS. FIG. 11 is a block diagram showing the configuration of the video processing apparatus according to the third embodiment. FIG. 12 is a conceptual diagram showing a plurality of images used for learning, and FIG. 13 is a plan view conceptually showing a hand presence range estimated based on learning. The third embodiment differs from the first and second embodiments described above in part of the configuration and operation, and the other portions are substantially the same. For this reason, in the following description, a different part from 1st and 2nd Example is described in detail, and description is abbreviate | omitted suitably about another overlapping part.
図11において、第3実施例に係る映像処理装置は、第1実施例と同様の構成に加えて、学習部180と、学習用記憶部185とを備えて構成されている。
In FIG. 11, the video processing apparatus according to the third embodiment includes a
学習部180は、本発明の「学習手段」の一例であり、学習用記憶部185に記憶されている画像から、人物の顔及び手の相対的な位置関係を学習する。学習用記憶部185には、例えば図12に示すように、複数の画像が記憶されている。
The
尚、学習部180における学習は、映像入力部110において画像が取得される毎に行われてもよいし、予め所定の画像を用いて学習させておいても構わない。即ち、学習部180は、装置の動作時にリアルタイムで学習を行うようなものであってもよいし、予め行われた学習の結果を記憶しておくだけのものでも構わない。
The learning in the
図13において、第3実施例に係る手存在範囲推定部130は、上述した第1実施例及び第2実施例とは異なり、学習部180における学習結果に基づいて手存在範囲を推定する。具体的には、図に示すように、顔210の位置に対して、手220の存在可能性が高い範囲P2が手存在範囲とされる。尚、図13では、内側の円である程、手の存在可能性が高くなることを示している。手の検出処理は、より内側の円(即ちm手の存在可能性が高い領域)から順に行われる。
13, the hand presence
学習結果に基づいて手存在範囲を推定すれば、例えば図6に示すように、顔210を中心とした円P1を手存在範囲と推定する場合と比べて、より適切な手存在範囲を推定できる。具体的には、手の存在確率が低い、図中の顔210の上側部分や右側部分を手存在範囲から除外できるため、手存在範囲をより狭い範囲として推定できる。よって、手を検出する処理の効率を更に高めることができる。
If the hand presence range is estimated based on the learning result, as shown in FIG. 6, for example, a more appropriate hand presence range can be estimated as compared to the case where the circle P1 centered on the
以上説明したように、第3実施例に係る映像処理装置によれば、学習結果に基づいて手存在範囲が推定される。従って、映像中の人物の手をより効率的に且つ精度よく検出することが可能である。 As described above, according to the video processing apparatus according to the third embodiment, the hand presence range is estimated based on the learning result. Therefore, it is possible to detect the hand of a person in the video more efficiently and accurately.
本発明は、上述した実施形態に限られるものではなく、特許請求の範囲及び明細書全体から読み取れる発明の要旨或いは思想に反しない範囲で適宜変更可能であり、そのような変更を伴う映像処理装置及び方法もまた本発明の技術的範囲に含まれるものである。 The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the gist or concept of the invention that can be read from the claims and the entire specification. And methods are also within the scope of the present invention.
110 映像入力部
120 顔検出部
130 手存在範囲推定部
140 手検出部
150 出力部
160 肌色領域検出部
170 差分領域検出部
175 メモリ
180 学習部
185 学習用記憶部
200 人物
210 顔
220 手
110
Claims (8)
検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定手段と、
前記手存在範囲において、前記人物の手を検出する処理を行う手検出手段と
を備えることを特徴とする映像処理装置。 Face detection means for detecting a human face from an image;
Range estimation means for estimating a hand presence range in which the person's hand may exist based on the detected face information of the person;
An image processing apparatus comprising: hand detection means for performing processing for detecting the hand of the person in the hand presence range.
前記範囲推定手段は、学習された前記相対的な位置関係に基づいて、前記手存在範囲を推定する
ことを特徴とする請求の範囲第1項に記載の映像処理装置。 Learning means for learning the relative positional relationship of the person's face and the person's hand;
The video processing apparatus according to claim 1, wherein the range estimation unit estimates the hand presence range based on the learned relative positional relationship.
検出した前記人物の顔の情報に基づいて、前記人物の手が存在し得る手存在範囲を推定する範囲推定工程と、
前記手存在範囲において、前記人物の手を検出する処理を行う手検出工程と
を備えることを特徴とする映像処理方法。 A face detection process for detecting a human face from an image;
A range estimation step of estimating a hand presence range in which the person's hand may exist based on the detected face information of the person;
And a hand detecting step of performing a process of detecting the hand of the person in the hand presence range.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2010-169673 | 2010-07-28 | ||
| JP2010169673 | 2010-07-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2012014590A1 true WO2012014590A1 (en) | 2012-02-02 |
Family
ID=45529811
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2011/063701 Ceased WO2012014590A1 (en) | 2010-07-28 | 2011-06-15 | Video processing device and method |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2012014590A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPWO2014073384A1 (en) * | 2012-11-06 | 2016-09-08 | 株式会社ソニー・インタラクティブエンタテインメント | Information processing device |
| JP2023512359A (en) * | 2020-12-29 | 2023-03-27 | 商▲湯▼国▲際▼私人有限公司 | Associated object detection method and apparatus |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2005216061A (en) * | 2004-01-30 | 2005-08-11 | Sony Computer Entertainment Inc | Image processor, image processing method, recording medium, computer program and semiconductor device |
| JP2006155563A (en) * | 2004-11-05 | 2006-06-15 | Fuji Xerox Co Ltd | Motion analysis device |
| WO2009018161A1 (en) * | 2007-07-27 | 2009-02-05 | Gesturetek, Inc. | Enhanced camera-based input |
-
2011
- 2011-06-15 WO PCT/JP2011/063701 patent/WO2012014590A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2005216061A (en) * | 2004-01-30 | 2005-08-11 | Sony Computer Entertainment Inc | Image processor, image processing method, recording medium, computer program and semiconductor device |
| JP2006155563A (en) * | 2004-11-05 | 2006-06-15 | Fuji Xerox Co Ltd | Motion analysis device |
| WO2009018161A1 (en) * | 2007-07-27 | 2009-02-05 | Gesturetek, Inc. | Enhanced camera-based input |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPWO2014073384A1 (en) * | 2012-11-06 | 2016-09-08 | 株式会社ソニー・インタラクティブエンタテインメント | Information processing device |
| US9672413B2 (en) | 2012-11-06 | 2017-06-06 | Sony Corporation | Setting operation area for input according to face position |
| JP2023512359A (en) * | 2020-12-29 | 2023-03-27 | 商▲湯▼国▲際▼私人有限公司 | Associated object detection method and apparatus |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9429418B2 (en) | Information processing method and information processing apparatus | |
| US9953211B2 (en) | Image recognition apparatus, image recognition method and computer-readable medium | |
| JP5789751B2 (en) | Feature extraction device, feature extraction method, feature extraction program, and image processing device | |
| US8761459B2 (en) | Estimating gaze direction | |
| CN103026384B (en) | Feature extraction device, feature extraction method and image processing device | |
| US10223804B2 (en) | Estimation device and method | |
| US20160260226A1 (en) | Method and apparatus for detecting object in moving image and storage medium storing program thereof | |
| US20150016679A1 (en) | Feature extraction device, feature extraction method, and feature extraction program | |
| JP2007042072A (en) | Tracking device | |
| US9082000B2 (en) | Image processing device and image processing method | |
| US11462052B2 (en) | Image processing device, image processing method, and recording medium | |
| WO2018059365A1 (en) | Graphical code processing method and apparatus, and storage medium | |
| KR101761586B1 (en) | Method for detecting borderline between iris and sclera | |
| JP2014186505A (en) | Visual line detection device and imaging device | |
| JP5656768B2 (en) | Image feature extraction device and program thereof | |
| JP2014021602A (en) | Image processor and image processing method | |
| JP2015197708A (en) | Object identification device, object identification method, and program | |
| US20140286530A1 (en) | Image processing apparatus, image processing method, and computer program product | |
| JP6530432B2 (en) | Image processing apparatus, image processing method and program | |
| US20140064556A1 (en) | Object detection system and computer program product | |
| WO2012014590A1 (en) | Video processing device and method | |
| JP5625196B2 (en) | Feature point detection device, feature point detection method, feature point detection program, and recording medium | |
| JP2013015891A (en) | Image processing apparatus, image processing method, and program | |
| JP2009289230A (en) | Image processing apparatus, image processing method, and image processing program | |
| JP2018109824A (en) | Electronic control device, electronic control system, and electronic control method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11812183 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 11812183 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: JP |