JP2014093023A

JP2014093023A - Object detection device, object detection method and program

Info

Publication number: JP2014093023A
Application number: JP2012244382A
Authority: JP
Inventors: Kaname Tomite; 要冨手; Hiroshi Torii; 寛鳥居
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-11-06
Filing date: 2012-11-06
Publication date: 2014-05-19

Abstract

PROBLEM TO BE SOLVED: To accurately detect an object in many situations where shields have been caused.SOLUTION: An object detection part 101 obtains a whole body detection result and head portion detection result (detection score) of a human body to be detected. Using results obtained by the object detection part 101, a shield detecting part 102 determines whether a shield is present in the human body being detected or not. According to the state of shield, the object detection part 101 corrects the detection score. At this time, a determination is made whether an overlapping area is present in the head portion detection result of a foreground human body and the whole body detection result of the background human body. If there is an overlapping area, a shield area is calculated, and the detection score is corrected according to the calculated shield area.

Description

本発明は、特に、遮蔽が生じている場合に用いて好適な物体検出装置、物体検出方法及びプログラムに関する。 In particular, the present invention relates to an object detection apparatus, an object detection method, and a program suitable for use when shielding is occurring.

１枚の画像から部分的に遮蔽が生じている人物であっても、精度良く、かつ頑健に検出できるようにする技術は、動作解析などに応用が可能であり、近年盛んに研究されている。このような技術は、特にセキュリティシステム、安全運転支援、医療福祉といった分野などにおいて、その応用が検討されている。このように監視カメラや車載カメラ等の分野においても、画像中の人体に遮蔽が生じている場合に頑健に人体を検出する手法が知られている。 Technology that enables accurate and robust detection of even a person who is partially shielded from a single image can be applied to motion analysis and has been actively researched in recent years. . Applications of such technologies are being studied, particularly in fields such as security systems, safe driving support, and medical welfare. As described above, also in the field of surveillance cameras, vehicle-mounted cameras, and the like, a technique for robustly detecting a human body when the human body in the image is shielded is known.

例えば特許文献１に開示されている方法では、まず、入力画像全体から予め設定した人体の検出条件よりも検出スコアが大きくなる領域を検出する。この領域が最も前景に存在する人体を検出する領域になっている。次に、この領域を基準に遮蔽が生じる可能性がある周辺探索領域を計算する。そして、周辺探索領域内で前記条件よりも判定条件を緩くした条件で検出処理を行うことにより、前景の人体によって遮蔽が生じた後景の人物を検出することができる。 For example, in the method disclosed in Patent Document 1, first, a region where the detection score is larger than the preset detection condition of the human body is detected from the entire input image. This region is a region for detecting the human body that is present in the foreground most. Next, a peripheral search region that may be shielded is calculated based on this region. Then, by performing detection processing under a condition in which the determination condition is looser than the above condition in the peripheral search region, it is possible to detect a background person who has been blocked by the foreground human body.

また、特許文献２に開示されている方法では、外光やガードレール等の高輝度物体により人体が部分的に遮蔽されている領域を、画像のコントラスト情報を利用して検出する。そして、検出した遮蔽領域は、遮蔽領域である旨を加味したスコアを計算することによりロバストに人体を検出している。 In the method disclosed in Patent Document 2, a region where a human body is partially shielded by a high-luminance object such as external light or a guardrail is detected using image contrast information. And the detected shielding area is detecting the human body robustly by calculating the score which considered that it was a shielding area.

また、非特許文献１に開示されている方法では、人体を構成するパーツの１つに遮蔽物が存在するか否かを示すoccluderを明示的に組み込むことにより、遮蔽物が存在しても頑健に人体を検出している。このoccluderを組み込むことによって、机やテーブルなどで人体が隠された場合に生じる直線状の強いエッジを検出して遮蔽状態を判定している。 In addition, the method disclosed in Non-Patent Document 1 is robust even if a shield is present by explicitly incorporating an occluder indicating whether or not the shield is present in one of the parts constituting the human body. The human body is detected. By incorporating this occluder, the shielding state is determined by detecting a strong straight edge that occurs when a human body is hidden by a desk or table.

特開２０１０−４９４３５号公報JP 2010-49435 A 特開２０１１−１６５１７０号公報JP 2011-165170 A

Proceedings of the Neural Information Processing Systems (NIPS) 2011.「Object Detection with Grammar Models」Proceedings of the Neural Information Processing Systems (NIPS) 2011. `` Object Detection with Grammar Models '' P. Viola and M. Jones (2001). "Rapid Object Detection using a Boosted Cascade of Simple Features", IEEE Conference on Computer Vision and Pattern Recognition.P. Viola and M. Jones (2001). "Rapid Object Detection using a Boosted Cascade of Simple Features", IEEE Conference on Computer Vision and Pattern Recognition. Dalal, N., & Triggs, B. (2005). "Histograms of oriented gradients for human detection.", IEEE CVPRDalal, N., & Triggs, B. (2005). "Histograms of oriented gradients for human detection.", IEEE CVPR Platt, J. C. (1999). Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers.Platt, J. C. (1999) .Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods.Advances in Large Margin Classifiers. Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Zadrozny, B., & Elkan, C. (2002) .Transforming classifier scores into accurate multiclass probability estimates.Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. P. Felzenszwalb, D. McAllester, D. Ramanan (2008) "A Discriminatively Trained, Multiscale, Deformable Part Model", IEEE Conference on Computer Vision and Pattern Recognition.P. Felzenszwalb, D. McAllester, D. Ramanan (2008) "A Discriminatively Trained, Multiscale, Deformable Part Model", IEEE Conference on Computer Vision and Pattern Recognition.

しかし、特許文献１では、明示的に遮蔽判定をしているわけではなく、検出された人体の近くでかつ遮蔽が存在しそうな人体については、検出閾値を下げて検出しやすくするという方法をとっている。そのため、誤検出が多くなり検出精度は下がる。 However, Patent Document 1 does not explicitly make a shielding decision. For a human body that is close to the detected human body and is likely to be shielded, the detection threshold is lowered to make it easier to detect. ing. Therefore, erroneous detection increases and detection accuracy decreases.

また、特許文献２では、人体を遮蔽する物体が高輝度物体であるという特性を利用し、コントラストの情報のみで遮蔽領域を検出しているため、必ずしも高コントラストになるとは限らない人体同士の重なりによる遮蔽などには対応することができない。 Further, in Patent Document 2, since the shielding region is detected only by contrast information using the characteristic that the object that shields the human body is a high-luminance object, the overlapping of human bodies that does not necessarily become high contrast. It cannot cope with shielding by.

非特許文献１では、机やテーブルといった直線的な強いエッジを対象としているため、机やテーブルによる人体の遮蔽には対応ができるが、形状が自由に変形する遮蔽（例えば、検出対象の前に他の人物が立っているなど）には対応できない。 In Non-Patent Document 1, since the object is a straight strong edge such as a desk or table, it can be used to shield a human body by a desk or table, but the shape can be freely deformed (for example, before a detection target). It is not possible to support other people standing up).

本発明は前述の問題点に鑑み、遮蔽が生じている多くの状況において精度良く物体を検出できるようにすることを目的としている。 The present invention has been made in view of the above-described problems, and an object of the present invention is to make it possible to detect an object with high accuracy in many situations where shielding occurs.

本発明の物体検出装置は、入力画像から前景に位置する第１の物体及び後景に位置する第２の物体を検出する物体検出装置であって、前記第１及び第２の物体の部分領域を検出する第１の検出手段と、前記第１及び第２の物体の姿勢を示す情報を検出する第２の検出手段と、前記第１の検出手段によって検出された部分領域と前記第２の検出手段によって検出された情報とに基づいて、前記第１の物体による前記第２の物体の遮蔽状態を判定する判定手段と、前記判定手段によって判定された遮蔽状態に応じて、前記第２の物体の検出結果を補正する補正手段と、を備えることを特徴とする。 An object detection apparatus according to the present invention is an object detection apparatus that detects a first object located in the foreground and a second object located in the background from an input image, and is a partial region of the first and second objects. First detection means for detecting the second detection means, second detection means for detecting information indicating the postures of the first and second objects, the partial area detected by the first detection means, and the second Based on the information detected by the detection means, the determination means for determining the shielding state of the second object by the first object, and the second state according to the shielding state determined by the determination means Correction means for correcting the detection result of the object.

本発明によれば、遮蔽が生じている多くの状況において精度良く物体を検出することができる。 According to the present invention, an object can be detected with high accuracy in many situations where shielding occurs.

実施形態に係る物体検出装置の簡単な構成例を示すブロック図である。It is a block diagram which shows the simple structural example of the object detection apparatus which concerns on embodiment. 実施形態に係る物体検出装置の詳細な構成例を示すブロック図である。It is a block diagram which shows the detailed structural example of the object detection apparatus which concerns on embodiment. 実施形態に係る物体検出装置の検出処理部が行う処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence which the detection process part of the object detection apparatus which concerns on embodiment performs. 各検出器の結果から頭部の位置を推定する概要を説明するための図である。It is a figure for demonstrating the outline | summary which estimates the position of a head from the result of each detector. 全身検出器の結果と頭部との位置関係の定義例を示す図である。It is a figure which shows the example of a definition of the positional relationship between the result of a whole body detector, and a head. 頭部の正解基準を用いて頭部位置推定結果を評価する処理を説明する図である。It is a figure explaining the process which evaluates a head position estimation result using the correct reference | standard of a head. 実施形態において、統合結果出力部が行う統合処理手順の一例を示すフローチャートである。5 is a flowchart illustrating an example of an integration processing procedure performed by an integration result output unit in the embodiment. 統合結果出力部から出力される処理結果の具体例を説明する図である。It is a figure explaining the specific example of the process result output from an integrated result output part. 前景となる人体により後景の人体が部分的に遮蔽されている様子を説明する図である。It is a figure explaining a mode that the human body of a back ground is partially shielded by the human body used as a foreground. 図９における遮蔽状態の判定方法と遮蔽領域の計算方法とを説明する図である。It is a figure explaining the determination method of the shielding state in FIG. 9, and the calculation method of a shielding area. ２人以上の人物が重なって遮蔽が生じている状態の一例を示す図である。It is a figure which shows an example of the state which two or more persons overlap and the shielding has arisen. 第１の実施形態における遮蔽判定部による処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence by the shielding determination part in 1st Embodiment. パーツベースの検出手法を用いた全身検出器を説明する図である。It is a figure explaining the whole body detector using the parts-based detection method. パーツベースの検出器と人体の体軸との関係を説明する図である。It is a figure explaining the relationship between a parts-based detector and a human body axis. 複数のパーツ検出器による頭部の推定結果と体軸の推定結果とから、遮蔽判定を行う事例を説明する図である。It is a figure explaining the example which performs shielding determination from the estimation result of a head by a plurality of parts detectors, and the estimation result of a body axis. 遮蔽領域の面積と位置とを算出する方法を説明する図である。It is a figure explaining the method of calculating the area and position of a shielding area. 第２の実施形態における遮蔽判定部による処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence by the shielding determination part in 2nd Embodiment. 前景の人体によって後景の人体が遮蔽されており、かつ両人体の頭部が重なっている場合の検出処理を説明する図である。It is a figure explaining a detection process in case the human body of a foreground is shielded by the human body of a foreground, and the heads of both human bodies have overlapped. 第３の実施形態における遮蔽判定部による処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence by the shielding determination part in 3rd Embodiment. 統合処理前の頭部および体軸の検出結果の一例を示す図である。It is a figure which shows an example of the detection result of the head and body axis before an integration process.

（第１の実施形態）
以下、本実施形態では、対象物を複数の部分領域に分割して、各部分領域の対象物らしさを部分領域スコアとして算出する。そして、一部または全ての部分領域スコアを統合して統合スコアを算出し、統合スコアを元に対象物であるかを判定することによって対象物検出処理を実施する。さらに、検出処理の過程で算出した部分領域のスコアと統合スコアとから、複数の対象物の位置姿勢情報を算出し、遮蔽判定を行うことにより頑健な検出処理を実現する。本実施形態で検出する対象物は特に限定されないが、検出対象物を人物とした場合について説明する。 (First embodiment)
Hereinafter, in the present embodiment, the object is divided into a plurality of partial areas, and the object-likeness of each partial area is calculated as a partial area score. And an object detection process is implemented by determining whether it is a target object based on an integrated score, integrating the partial area score of a part or all, and calculating an integrated score. Furthermore, robust detection processing is realized by calculating position and orientation information of a plurality of objects from the partial region score and the integrated score calculated in the process of detection processing, and performing occlusion determination. Although the target object detected in the present embodiment is not particularly limited, a case where the detection target object is a person will be described.

図１及び図２は、本実施形態に係る物体検出装置１０の構成例を示すブロック図である。以下、図１及び図２を参照しながら本実施形態の構成について説明する。なお、本実施形態の物体検出装置１０は、ネットワークまたは各種記録媒体を介して取得したソフトウェア（プログラム）を、ＣＰＵ、メモリ、ストレージデバイス、入出力装置、バス、表示装置などにより構成される制御装置にて実行することで実現できる。また、不図示の制御装置については、汎用の制御装置を用いてもよいし、本発明のソフトウェアに最適に設計されたハードウェアを用いてもよい。 1 and 2 are block diagrams illustrating a configuration example of the object detection apparatus 10 according to the present embodiment. Hereinafter, the configuration of the present embodiment will be described with reference to FIGS. 1 and 2. Note that the object detection apparatus 10 of the present embodiment is a control apparatus in which software (program) acquired via a network or various recording media is configured by a CPU, a memory, a storage device, an input / output device, a bus, a display device, and the like. It can be realized by executing. For the control device (not shown), a general-purpose control device may be used, or hardware optimally designed for the software of the present invention may be used.

＜検出処理に関わる構成＞
図１に示すように、本実施形態の物体検出装置１０は、検出処理において、画像入力部１００、物体検出部１０１、及び遮蔽判定部１０２を介して、対象物の検出結果を出力する。以下、これらの構成ブロックについて説明する。 <Configuration related to detection processing>
As illustrated in FIG. 1, the object detection apparatus 10 according to the present embodiment outputs a detection result of an object via an image input unit 100, an object detection unit 101, and a shielding determination unit 102 in the detection process. Hereinafter, these structural blocks will be described.

画像入力部１００は、画像処理装置に画像を入力する部分である。画像入力部１００に入力される画像は、カメラなどから得た動画像の１フレームの画像でもよく、ハードディスクドライブなどのストレージデバイスに保存された画像でもよい。 The image input unit 100 is a part that inputs an image to the image processing apparatus. The image input to the image input unit 100 may be a one-frame image of a moving image obtained from a camera or the like, or may be an image stored in a storage device such as a hard disk drive.

以下、入力画像の注目領域に対する処理について説明する。注目領域は、入力画像の一部の領域であり、後述する対象物検出器のサイズと同じ画像サイズである。入力画像中の注目領域を、画像内でスライドさせながら順次設定することにより、画像全体から対象物を検出することができる。また、さらに入力画像を拡大あるいは縮小することにより、様々なサイズに写った対象物を検出することができる。 Hereinafter, processing for the attention area of the input image will be described. The attention area is a partial area of the input image and has the same image size as the size of an object detector described later. By sequentially setting the attention area in the input image while sliding in the image, the object can be detected from the entire image. Further, by further enlarging or reducing the input image, it is possible to detect objects captured in various sizes.

物体検出部１０１は、検出対象となっている対象物の推定位置およびスコア（尤度）を算出する。以下、物体検出部１０１の詳細について、図２を参照しながら説明する。 The object detection unit 101 calculates the estimated position and score (likelihood) of the target object to be detected. Details of the object detection unit 101 will be described below with reference to FIG.

物体検出部１０１は、図２に示すように、画像入力部１００、第１の検出処理部２１１〜第ｎの検出処理部２１ｎ、及び各検出処理部に対応した第１の共通部位推定部２２１〜第ｎの共通部位推定部２２ｎを備えている。さらに、第１のスコア補正辞書２３１〜第ｎのスコア補正辞書２３ｎ、第１のスコア補正部２４１〜第ｎのスコア補正部２４ｎ、及び統合結果出力部２５０を備えている。以下、画像入力部１００に入力された１枚の画像に対する処理を例にして、各構成要素について説明する。 2, the object detection unit 101 includes an image input unit 100, a first detection processing unit 211 to an nth detection processing unit 21n, and a first common part estimation unit 221 corresponding to each detection processing unit. -Nth common part estimation part 22n is provided. Furthermore, a first score correction dictionary 231 to an nth score correction dictionary 23n, a first score correction unit 241 to an nth score correction unit 24n, and an integration result output unit 250 are provided. In the following, each component will be described by taking a process for one image input to the image input unit 100 as an example.

第１の検出処理部２１１〜第ｎの検出処理部２１ｎには、対象物の異なる部位や状態を検出する検出器が、予め格納されている。検出対象物を人物とした場合、各検出処理部の異なる検出器には、顔検出器、頭部検出器、上半身検出器、全身検出器などの人物の異なる部位を対象とした検出器を用いることができる。人物の異なる部位を検出する検出器を用いることにより、人物の一部が他の物体から遮蔽されている場合や、画像中から人物の一部がはみ出している場合にも人物を検出できるようになる。 In the first detection processing unit 211 to the n-th detection processing unit 21n, detectors that detect different parts and states of the object are stored in advance. When a detection target is a person, a detector for a different part of the person such as a face detector, a head detector, an upper body detector, or a whole body detector is used as a different detector in each detection processing unit. be able to. Using a detector that detects different parts of a person so that the person can be detected even when part of the person is shielded from other objects or when part of the person protrudes from the image Become.

これらの複数の検出器は、それぞれ相互に補間し合うような検出器を準備することが望ましい。検出器を相互に補間し合う例として、例えば、頭部検出器と全身検出器との組み合わせが考えられる。まず、頭部検出器は、胴体以下が他の物体から遮蔽されていても人物を検出することが可能であり、また、身体部分の姿勢変動に影響を受けずに人物を検出できるという利点がある。しかし、頭部は特徴的な形状が少ないため、検出性能は全身検出器に比べて劣る傾向にあることが欠点である。一方、全身検出器は、対象としている部位が大きいため、人物の特徴を捉えやすく検出性能が比較的高いという利点があるが、遮蔽に弱いという欠点がある。そこで、頭部検出器と全身検出器とを同時に利用することにより、相互の欠点を補うことができるようになり、人物検出の精度向上が期待できる。 It is desirable to prepare a detector that interpolates each of the plurality of detectors. As an example in which the detectors are interpolated with each other, for example, a combination of a head detector and a whole body detector can be considered. First, the head detector can detect a person even if the body and the lower part are shielded from other objects, and has the advantage of being able to detect a person without being affected by posture fluctuations of the body part. is there. However, since the head has few characteristic shapes, the detection performance tends to be inferior to that of the whole body detector. On the other hand, the whole body detector has an advantage that it is easy to capture the characteristics of a person and has a relatively high detection performance because it has a large target part, but it has a drawback that it is vulnerable to shielding. Therefore, by using the head detector and the whole body detector at the same time, it becomes possible to compensate for the mutual drawbacks and improve the accuracy of human detection.

一方、顔検出器では、非特許文献２に開示されている方法のように、学習画像の顔範囲のHaar-Like特徴量を収集し、AdaBoostにより統計的に顔らしい特徴を識別できるようにして顔検出器を学習する。また、頭部や上半身、全身などその他の人物部位を学習する場合は、画像特徴としては非特許文献３に記載されているＨＯＧ特徴量を用いる。頭部検出器や上半身検出器、全身検出器を準備する場合には、各部位の学習画像を準備し、それぞれのＨＯＧ特徴量を取得して、ＳＶＭ（サポートベクタマシーン）やAdaBoost等の識別器により各部位の検出器を学習する。学習の結果（例えば、AdaBoostの弱識別器など）は、検出器辞書として保存し、検出時に利用する。 On the other hand, the face detector collects Haar-Like feature values of the face range of the learning image and enables statistical recognition of facial features by AdaBoost, as in the method disclosed in Non-Patent Document 2. Learn face detector. When learning other human parts such as the head, upper body, and whole body, the HOG feature amount described in Non-Patent Document 3 is used as the image feature. When preparing a head detector, upper body detector, or whole body detector, prepare a learning image of each part, acquire each HOG feature quantity, and discriminator such as SVM (support vector machine) or AdaBoost To learn the detector of each part. The learning result (for example, AdaBoost's weak classifier) is stored as a detector dictionary and used at the time of detection.

また、各検出器は、人物検出の尤もらしさを検出スコアとして算出する。例えばAdaBoostでは、各弱識別器の出力の重み付き和を検出スコアとして出力する。またＳＶＭでは、識別超平面との距離を検出器スコアとして算出する。上記以外の判別処理でも、尤度など対象物らしさを表すスコアを出力する方法であれば、どのような方法を用いてもよい。検出スコアは、対象物を示す確率値などに変換するなどして、各検出器の検出スコアを比較可能な値にしておくことが好ましい。なお、以下では、検出器スコアが高いほど、各検出器が対象としている人物部位、または人物状態らしさが高い出力が得られているものとする。 Each detector calculates the likelihood of human detection as a detection score. For example, AdaBoost outputs the weighted sum of the outputs of each weak classifier as a detection score. In SVM, the distance from the identification hyperplane is calculated as a detector score. Any method other than the above may be used as long as it is a method for outputting a score representing the likelihood of an object such as likelihood. The detection score is preferably set to a value that can be compared with the detection score of each detector by converting the detection score into a probability value indicating the object. In the following description, it is assumed that the higher the detector score, the higher the output that is likely to be the human part or the human state targeted by each detector.

以下、本実施形態では、複数の検出器として、顔検出器、頭部検出器、全身検出器の３つを用いる場合について説明するが、用いる検出器の構成はこの限りではない。 Hereinafter, although this embodiment demonstrates the case where three are used as a some detector, a face detector, a head detector, and a whole body detector, the structure of the detector to be used is not this limitation.

次に、第１の検出処理部２１１、第２の検出処理部２１２、第ｎの検出処理部２１ｎによる処理について説明する。図３は、第１の検出処理部２１１〜第ｎの検出処理部２１ｎが行う処理手順の一例を示すフローチャートである。以下、第１の検出処理部２１１が全身検出器の検出処理を行う例について説明する。 Next, processing by the first detection processing unit 211, the second detection processing unit 212, and the nth detection processing unit 21n will be described. FIG. 3 is a flowchart illustrating an example of a processing procedure performed by the first detection processing unit 211 to the nth detection processing unit 21n. Hereinafter, an example in which the first detection processing unit 211 performs detection processing of the whole body detector will be described.

まず、ステップＳ３０１において、入力画像の画像特徴量を算出する。この処理では、第１の検出処理部２１１の検出器は全身検出器であるため、入力画像からＨＯＧ特徴量を算出する。次に、ステップＳ３０２において、検出処理を行う画像の特定位置の画像特徴量を取得する。そして、ステップＳ３０３において、検出器辞書を用いて、処理対象の画像特徴量の対象物らしさを判別し、検出スコアを算出する。 First, in step S301, the image feature amount of the input image is calculated. In this process, since the detector of the first detection processing unit 211 is a whole body detector, the HOG feature amount is calculated from the input image. Next, in step S302, an image feature amount at a specific position of the image to be detected is acquired. In step S303, using the detector dictionary, the object feature of the image feature quantity to be processed is determined, and a detection score is calculated.

次に、ステップＳ３０４において、入力画像全体において検出スコアを算出したか否かを判定する。画像全体を探索するため、各画像中の位置で検出スコアを算出するよう判別位置を変えながら画像全体に対して行う。この判定の結果、入力画像全体において検出スコアを算出していない場合は、ステップＳ３０２に戻る。なお、判別位置を変更する際に、画像サイズも変更することにより、画像中で異なるサイズで写っている人物も検出できるようになる。 Next, in step S304, it is determined whether or not a detection score has been calculated for the entire input image. In order to search the entire image, it is performed on the entire image while changing the discrimination position so that the detection score is calculated at the position in each image. As a result of this determination, if the detection score is not calculated for the entire input image, the process returns to step S302. When changing the discrimination position, the image size is also changed, so that a person appearing in a different size in the image can be detected.

一方、ステップＳ３０４の判定の結果、入力画像全体において検出スコアを算出した場合は、画像中の各位置での検出スコアが得られる。ここで、この結果を全て第１の共通部位推定部２２１に送るようにしてもよいが、明らかに人物ではないと判断できる低い検出スコアの検出結果については、これ以降の処理を省略して全体の処理負荷を低減することができる。したがって、次のステップＳ３０５においては、所定のスコア以上の結果を残す閾値処理を行い、無駄な検出結果を削除する。ステップＳ３０５の処理の結果、画像中で検出スコアが所定値よりも高い位置の位置情報とその検出スコアを共通部位推定部２２１に出力する。 On the other hand, as a result of the determination in step S304, when the detection score is calculated for the entire input image, the detection score at each position in the image is obtained. Here, all of the results may be sent to the first common part estimation unit 221. However, for the detection result of a low detection score that can be clearly determined not to be a person, the subsequent processing is omitted. The processing load can be reduced. Therefore, in the next step S305, threshold processing is performed to leave a result of a predetermined score or more, and useless detection results are deleted. As a result of the processing in step S305, the position information of the position where the detection score is higher than the predetermined value in the image and the detection score are output to the common part estimation unit 221.

以上、１つの検出処理部の処理結果を説明したが、物体検出装置１０全体としては、この検出処理部の処理を検出処理部の数だけ繰り返す。 Although the processing result of one detection processing unit has been described above, the entire object detection apparatus 10 repeats the processing of this detection processing unit by the number of detection processing units.

次に、第１の共通部位推定部２２１〜第ｎの共通部位推定部２２ｎについて説明する。第１の共通部位推定部２２１〜第ｎの共通部位推定部２２ｎでは、対象物の共通部位の位置を各検出器の結果から推定する。本実施形態では、異なる部位を検出する検出器の結果を統合するために、各検出器から対象物の共通な部位の位置または範囲を推定し、推定した部位の位置関係を元に検出結果を統合する。 Next, the first common part estimation unit 221 to the nth common part estimation unit 22n will be described. In the first common part estimation unit 221 to the n-th common part estimation unit 22n, the position of the common part of the object is estimated from the result of each detector. In this embodiment, in order to integrate the results of the detectors that detect different parts, the position or range of the common part of the target object is estimated from each detector, and the detection result is based on the estimated positional relationship of the parts. Integrate.

例えば、各検出処理部において人体の腕や足、胴体が検出され、共通部位として頭部が推定される。複数の検出処理部で検出される結果より共通部位を推定して、後段の処理で検出候補を絞り込むことも可能になる。以下、本実施形態では、人物の頭部を共通部位として定義し、第１の共通部位推定部２２１が各検出器の検出結果から頭部位置を推定する手順について説明する。ただし、本実施形態では推定する共通部位を頭部にしたが、各検出器で共通に推定可能な部位であれば、特に限定するものではない。例えば、共通に推定可能な部位として人体の喉元から腰の中心を結ぶ体軸などが挙げられる。なお、検出対象物が人物である場合、人物の頭部は比較的遮蔽されにくい部位であるため共通部位として適している。 For example, human arms, legs, and torso are detected in each detection processing unit, and the head is estimated as a common part. It is also possible to estimate a common part from the results detected by a plurality of detection processing units and narrow down detection candidates by subsequent processing. Hereinafter, in the present embodiment, a procedure in which a person's head is defined as a common part and the first common part estimation unit 221 estimates the head position from the detection results of the respective detectors will be described. However, in this embodiment, the common part to be estimated is the head, but there is no particular limitation as long as it is a part that can be commonly estimated by each detector. For example, a body axis that connects the center of the waist from the throat of the human body can be cited as a part that can be estimated in common. When the detection target is a person, the head of the person is a part that is relatively difficult to be shielded, and thus is suitable as a common part.

図４は、各検出器の結果から頭部の位置を推定する概要を説明するための図である。各検出器の検出処理の結果、検出対象の位置・範囲の情報が得られており、本実施形態では、検出結果の位置・範囲は、検出対象を囲む矩形枠で得られるものとする。図４に示す例では、検出結果を矩形枠で示しており、それぞれ顔検出結果枠４０１、頭部検出結果枠４０２、全身検出結果枠４０３の情報が得られている。矩形枠の座標Ｘは、画像座標の２点を用いて、以下の式（１）により表される。 FIG. 4 is a diagram for explaining an outline of estimating the position of the head from the result of each detector. As a result of the detection processing of each detector, information on the position / range of the detection target is obtained, and in this embodiment, the position / range of the detection result is obtained by a rectangular frame surrounding the detection target. In the example shown in FIG. 4, the detection result is shown by a rectangular frame, and information on the face detection result frame 401, the head detection result frame 402, and the whole body detection result frame 403 is obtained. The coordinate X of the rectangular frame is expressed by the following equation (1) using two image coordinate points.

ここで、ｘ₁、ｙ₁は矩形の左上点の画像座標であり、ｘ₂、ｙ₂は矩形の右下点の画像座標である。第１の共通部位推定部２２１では、この矩形枠から共通部位として頭部位置・範囲を推定する。図４に示す例の場合は、顔検出結果枠４０１から推定した頭部位置・範囲を矩形枠４１１で表し、全身検出結果枠４０３から推定した頭部位置・範囲を矩形枠４１３で表している。検出結果枠から頭部の位置を推定する場合には、予め検出結果枠と頭部との位置関係を定義しておき、検出結果枠から頭部の位置に変換することによって頭部の位置を推定する。 Here, x ₁ and y ₁ are the image coordinates of the upper left point of the rectangle, and x ₂ and y ₂ are the image coordinates of the lower right point of the rectangle. The first common part estimation unit 221 estimates a head position / range as a common part from the rectangular frame. In the example shown in FIG. 4, the head position / range estimated from the face detection result frame 401 is represented by a rectangular frame 411, and the head position / range estimated from the whole body detection result frame 403 is represented by a rectangular frame 413. . When estimating the position of the head from the detection result frame, the positional relationship between the detection result frame and the head is defined in advance, and the position of the head is converted by converting the detection result frame to the position of the head. presume.

図５は、全身検出器の結果と頭部との位置関係の定義例を示す図である。図５に示す例では、全身検出器に対して頭部の位置は、全身検出器の高さｈ_Bの１５％を頭部高さｈ_Hとし、全身検出器の幅ｗ_Bの５０％を頭部幅ｗ_Hと定義している。また、ｘ軸方向には０．２５ｗ_Bのオフセットが定義されている。全身検出器から頭部の位置を推定する際には、全身結果の座標Ｘから、図５に示す定義に従って頭部座標Ｘ_hとして求める。頭部座標Ｘ_hは以下の式（２）により表される。 FIG. 5 is a diagram illustrating a definition example of the positional relationship between the result of the whole body detector and the head. In the example shown in FIG. 5, the position of the head with respect to the whole body detector is 15% of the height h _B of the whole body detector, the head height h _H, and 50% of the width w _B of the whole body detector. It is defined as the head width w _H. In addition, an offset of 0.25 w _B is defined in the x-axis direction. When estimating the position of the head from the whole body detector, the head coordinates X _h are obtained from the coordinates X of the whole body result according to the definition shown in FIG. The head coordinate X _h is expressed by the following equation (2).

ここで、ｘ_h1、ｙ_h1は推定した頭部範囲の左上点の座標であり、ｘ_h2、ｙ_h2は推定した頭部範囲の右下点の座標である。第１の共通部位推定部２２１では、第１の検出処理部２１１の処理の結果として得られた各検出結果について、式（１）に示した座標Ｘから頭部推定座標Ｘ_hを算出する。 Here, x _h1 and y _h1 are the coordinates of the upper left point of the estimated head range, and x _h2 and y _h2 are the coordinates of the lower right point of the estimated head range. The first common part estimation unit 221 calculates head estimation coordinates X _h from the coordinates X shown in Expression (1) for each detection result obtained as a result of the processing of the first detection processing unit 211.

なお、頭部範囲の定義は、各数値を予め人が入力・設計してもよいし、実際の全身検出結果から得られる頭部位置の平均から設計するようにしてもよい。頭部の位置の平均を取得する場合には、複数のサンプル画像に全身検出器による検出処理を行い、検出結果内の頭部位置の平均値を算出することによって求めることができる。 Note that the definition of the head range may be designed in advance by a person inputting and designing each numerical value in advance, or may be designed from the average of head positions obtained from actual whole body detection results. When obtaining the average of the head position, it can be obtained by performing detection processing with a whole body detector on a plurality of sample images and calculating the average value of the head position in the detection result.

以上の説明では、全身検出器から頭部位置を推定する方法を例にして第１の共通部位推定部２２１の動作について説明した。他の検出器の検出結果から頭部の位置を推定する場合についても、全身検出器の場合と同様に、各検出結果と頭部との位置関係を定義しておき、検出結果から頭部の位置を推定する。全身検出器では、検出結果の内部の頭部の位置を推定したが、推定する位置は検出結果の内部である必要はない。 In the above description, the operation of the first common part estimation unit 221 has been described using the method of estimating the head position from the whole body detector as an example. When estimating the head position from the detection results of other detectors, as in the case of the whole body detector, the positional relationship between each detection result and the head is defined, and Estimate the position. In the whole body detector, the position of the head inside the detection result is estimated, but the position to be estimated need not be inside the detection result.

例えば、図４に示す顔検出結果枠４０１から推定した頭部の位置を示す矩形枠４１１は、顔検出結果枠４０１の外側にある。また、頭部そのものを検出する頭部検出器の頭部検出結果枠４０２では、共通部位推定部の処理を省略して、頭部検出結果そのものを共通部位として推定したという結果として出力するようにしてもよい。 For example, a rectangular frame 411 indicating the position of the head estimated from the face detection result frame 401 shown in FIG. 4 is outside the face detection result frame 401. In the head detection result frame 402 of the head detector that detects the head itself, the processing of the common part estimation unit is omitted, and the head detection result itself is output as a result of estimation as a common part. May be.

次に、第１のスコア補正辞書２３１〜第ｎのスコア補正辞書２３ｎと第１のスコア補正部２４１〜第ｎのスコア補正部２４ｎについて説明する。本実施形態では、各検出結果から推定した共通部位の位置と、各検出スコアとを用いて複数の異なる検出結果を統合する。ここで、共通部位の位置は検出結果から推定した結果であり、その推定精度は検出器によって異なる。本実施形態では、頭部の位置を共通部位として推定しているが、頭部の位置の推定性能は、頭部の位置に近い、あるいは頭部と関係が深い検出器の方が良いと考えられる。そこで、共通部位の推定性能の差を考慮した統合を行うために、第１のスコア補正部２４１〜第ｎのスコア補正部２４ｎは、それぞれ第１のスコア補正辞書２３１〜第ｎのスコア補正辞書２３ｎを用いて共通部位の推定性能差に基づいた検出スコアの補正を行う。そして、補正した検出スコアを用いて周囲の検出結果を統合することにより、対象物の検出結果の位置精度が向上することが期待できる。 Next, the first score correction dictionary 231 to the nth score correction dictionary 23n and the first score correction unit 241 to the nth score correction unit 24n will be described. In the present embodiment, a plurality of different detection results are integrated using the position of the common part estimated from each detection result and each detection score. Here, the position of the common part is a result estimated from the detection result, and the estimation accuracy differs depending on the detector. In this embodiment, the head position is estimated as a common part, but the head position estimation performance is better for a detector that is close to the head position or deeply related to the head. It is done. Therefore, in order to perform integration in consideration of the difference in the estimation performance of the common part, the first score correction unit 241 to the n-th score correction unit 24n respectively include the first score correction dictionary 231 to the n-th score correction dictionary. 23n is used to correct the detection score based on the estimated performance difference of the common part. Then, by integrating the surrounding detection results using the corrected detection score, it can be expected that the position accuracy of the detection result of the object is improved.

第１のスコア補正部２４１〜第ｎのスコア補正部２４ｎでは、それぞれの検出器の検出スコアを、それぞれ第１のスコア補正辞書２３１〜第ｎのスコア補正辞書２３ｎに記録された情報を用いて変換する。第１のスコア補正辞書２３１〜第ｎのスコア補正辞書２３ｎには、各検出器が共通部位を推定する信頼度に基づいて検出スコアを補正するための情報が格納されている。 The first score correction unit 241 to the n-th score correction unit 24n use the information recorded in the first score correction dictionary 231 to the n-th score correction dictionary 23n for the detection score of each detector, respectively. Convert. The first score correction dictionary 231 to the n-th score correction dictionary 23n store information for correcting the detection score based on the reliability with which each detector estimates the common part.

スコア補正では、検出器ごとに補正係数を各スコア補正辞書に保存し、スコア補正時には係数を検出スコアに乗じて補正スコアを算出すればよい。補正係数の例としては、頭部検出器の補正係数を１とし、顔検出器の補正係数を０．８、全身検出器の補正係数を０．５などとする。このように、頭部に近い検出器（頭部位置の推定性能が高い検出器）では大きい補正係数を設定し、頭部から遠い検出器（頭部位置の推定性能が低い検出器）では低い補正係数を設定する。この補正係数は、姿勢・撮影条件・遮蔽の発生部位などにより、変わり得るものであり、それらの状態を判定して適応的に設定してもよい。本実施形態では、検出器性能に関する事前確率を予め求めておき、その値を用いる。この補正係数を検出スコアに乗じて補正スコアを得ることにより、検出器の検出結果と共通部位の推定の性能とを考慮した補正スコアを得ることができる。補正スコアは、対象物らしさを示す検出スコアに、共通部位の位置推定の確からしさによって重み付けしたスコアとなっており、対象物らしさと位置の確からしさとを合わせて示すことになる。 In score correction, a correction coefficient may be stored in each score correction dictionary for each detector, and a correction score may be calculated by multiplying the detection score by the coefficient during score correction. As an example of the correction coefficient, the correction coefficient of the head detector is 1, the correction coefficient of the face detector is 0.8, the correction coefficient of the whole body detector is 0.5, and the like. Thus, a detector close to the head (a detector with high head position estimation performance) sets a large correction coefficient, and a detector far from the head (a detector with low head position estimation performance) has a low correction coefficient. Set the correction factor. This correction coefficient can be changed depending on the posture, imaging conditions, shielding location, etc., and may be set adaptively by determining those states. In the present embodiment, a prior probability relating to detector performance is obtained in advance and the value is used. By multiplying the detection score by this correction coefficient to obtain a correction score, it is possible to obtain a correction score in consideration of the detection result of the detector and the performance of estimating the common part. The correction score is a score obtained by weighting the detection score indicating the likelihood of the target object with the likelihood of the position estimation of the common part, and indicates the target characteristic and the probability of the position together.

なお、補正係数をユーザが入力して設定してもよいが、補正係数は各検出器で推定する頭部位置の正解確率によって設定することが好適である。そこで、各検出器の頭部の位置の推定に係る正解確率については、事前に求めておく必要がある。以下、図６を参照しながら頭部の位置の推定に係る正解確率の求め方と各スコア補正辞書に保存する補正係数とについて説明する。 The correction coefficient may be input and set by the user, but the correction coefficient is preferably set according to the correct probability of the head position estimated by each detector. Therefore, it is necessary to obtain in advance the correct probability related to the estimation of the position of the head of each detector. Hereinafter, with reference to FIG. 6, a description will be given of how to obtain the correct answer probability related to head position estimation and the correction coefficient stored in each score correction dictionary.

まず、頭部の位置が既知な画像サンプル群を準備する。図６（Ａ）は、画像６００の人物の頭部位置が既知である画像の例を示しており、頭部範囲の座標が頭部正解６０１として記録されている。ここで、画像６００は、人物が一人しか写っていない、または、一人の人物範囲に切り出された画像であることが望ましい。このように頭部の位置が既知である画像を大量に準備する。 First, an image sample group whose head position is known is prepared. FIG. 6A shows an example of an image in which the head position of a person in the image 600 is known, and the coordinates of the head range are recorded as a head correct answer 601. Here, it is desirable that the image 600 is an image in which only one person is shown or is cut out within the range of one person. In this way, a large number of images with known head positions are prepared.

次に、図６（Ｂ）は、図６（Ａ）の画像に対して顔検出を実施した結果を示している。顔検出の結果、検出処理で説明した処理と同様に、画像６００全体に顔検出器の検出処理が逐次行われる。ここでは、画像６００の中で顔検出の検出スコアが最も高い検出結果６１１に着目する。画像６００には人物が一人しか写っていないため、最も高いスコアを示す検出結果６１１が顔であると考えられる。 Next, FIG. 6B shows the result of performing face detection on the image of FIG. As a result of the face detection, the face detector detection process is sequentially performed on the entire image 600 as in the process described in the detection process. Here, attention is focused on the detection result 611 having the highest face detection detection score in the image 600. Since only one person is shown in the image 600, the detection result 611 showing the highest score is considered to be a face.

次に、この顔検出結果から頭部の位置を推定した推定結果６１２を算出する。この頭部の位置の推定結果６１２と頭部正解６０１とを比較して、頭部の推定が正しく行われたか否かを評価する。頭部正解６０１と頭部の位置の推定結果６１２とを比較する際には、例えば、各位置の中心間距離が所定範囲内であれば、推定結果が正解であるものとする。また、他の基準としては、矩形形状の頭部正解６０１と頭部の位置の推定結果６１２との重複率を算出し、所定の重複率以上を示す場合に推定結果が正解であるものとしてもよい。矩形の重複率αの算出方法としては、例えば、以下の式（３）によって算出できる。 Next, an estimation result 612 in which the position of the head is estimated from the face detection result is calculated. The head position estimation result 612 is compared with the head correct answer 601 to evaluate whether or not the head is correctly estimated. When comparing the head correct answer 601 and the head position estimation result 612, for example, if the center-to-center distance of each position is within a predetermined range, the estimation result is assumed to be correct. In addition, as another criterion, the overlap rate between the rectangular head correct answer 601 and the head position estimation result 612 may be calculated, and the estimation result may be correct when it indicates a predetermined overlap rate or higher. Good. As a method of calculating the rectangular overlap rate α, for example, it can be calculated by the following equation (3).

ここで、Ｓ_bは頭部正解の面積、Ｓ_eは推定した頭部範囲の面積、Ｓ_beは頭部正解と推定した頭部範囲の重複した領域の面積である。以上の正解判定を、準備した全ての画像サンプル群に対し実行し、頭部推定が正解となった確率を求めることができ、その確率を補正係数とする。なお、画像サンプルに対して検出結果自体が得られない場合には、頭部推定は不正解として判定する。 Here, S _b is the area of the head correct, S _e is the area of the head ranges estimated, S _BE is the area of the overlapping region of the head ranges presumed head correct. The above correct answer determination is executed for all the prepared image sample groups, and the probability that the head estimation is correct can be obtained, and the probability is set as a correction coefficient. If the detection result itself is not obtained for the image sample, the head estimation is determined as an incorrect answer.

他の検出器についても同様に、頭部推定の正解確率を各検出器について求め、それぞれの正解確率を各検出器の補正係数として利用すればよい。図６（Ｄ）に示す例では、全身検出器の検出結果６３０から推定した頭部位置６３１と頭部正解６０１との位置関係を評価する。図６（Ｄ）に示す例の場合、頭部正解６０１から頭部位置６３１が大きくずれているため、全身検出器からの頭部の位置の推定は不正解となる。 Similarly, for other detectors, the correct probability of head estimation may be obtained for each detector, and each correct probability may be used as a correction coefficient for each detector. In the example shown in FIG. 6D, the positional relationship between the head position 631 estimated from the detection result 630 of the whole body detector and the head correct answer 601 is evaluated. In the example shown in FIG. 6D, since the head position 631 is greatly deviated from the head correct answer 601, the estimation of the head position from the whole body detector is incorrect.

また、図６（Ｃ）には、頭部検出器の検出結果の正解判定の例を示している。頭部検出器の結果についても、他と同様に頭部正解との評価を行い、頭部位置を示す性能を評価して補正係数を算出してもよい。頭部検出器では、頭部の位置の推定を必ずしも行う必要がないので、その場合は検出結果そのものの位置と頭部正解との評価を行う。 FIG. 6C shows an example of correct answer determination of the detection result of the head detector. For the result of the head detector, the correction coefficient may be calculated by evaluating the head correct answer in the same manner as the others and evaluating the performance indicating the head position. Since the head detector does not necessarily have to estimate the position of the head, in this case, the position of the detection result itself and the head correct answer are evaluated.

また、上記の重複率αを用いて補正係数を算出する場合には、各画像サンプルで正解・不正解の２値判定により正解確率を算出している。そこで、この情報を利用して非特許文献４に開示されているPlatt scalingや非特許文献５に開示されているIsotonic Regressionを行って、スコア補正を行ってもよい。また、スコア補正を全く行わなくてもよい。 When calculating the correction coefficient using the overlap rate α, the correct probability is calculated by binary determination of correct / incorrect for each image sample. Therefore, this information may be used to perform score correction by performing Platt scaling disclosed in Non-Patent Document 4 and Isotonic Regression disclosed in Non-Patent Document 5. Moreover, it is not necessary to perform score correction at all.

以上の処理により補正スコアが算出されると、統合結果出力部２５０は、これら検出器の結果を統合し、同じ人物に対して複数の検出器から出力される情報を１つにまとめる。なお、本実施形態では、同じ人物周辺において同じ検出器から重複して出力された検出結果をまとめることを目的としているわけではない。 When the correction score is calculated by the above processing, the integration result output unit 250 integrates the results of these detectors, and combines the information output from the plurality of detectors into the same person. In the present embodiment, the purpose is not to collect the detection results output from the same detector around the same person.

以下、同じ人物に対して複数の検出器から出力される情報を１つにまとめる処理について説明する。図７は、本実施形態において、統合結果出力部２５０が行う統合処理手順の一例を示すフローチャートである。図７に示す処理では、ステップＳ７０１〜ステップＳ７０４の処理を全身検出器の個々の出力結果に対してループしながら行う例について説明する。以下、注目している全身検出器の出力結果の番号をｉ（ｉ＝１，・・・，Ｌ）とし、その番号の頭部推定座標をＸ_hB,iとする。 Hereinafter, a process of combining information output from a plurality of detectors for the same person into one will be described. FIG. 7 is a flowchart illustrating an example of an integration processing procedure performed by the integration result output unit 250 in the present embodiment. In the process illustrated in FIG. 7, an example will be described in which the processes in steps S701 to S704 are performed while looping the individual output results of the whole body detector. Hereinafter, the number of the output result of the whole body detector of interest is i (i = 1,..., L), and the head estimation coordinates of that number are X _{hB, i} .

まず、ステップＳ７０１において、全身検出器の検出結果が残っているか否かを判定する。この結果、残っている場合にはステップＳ７０２に進み、そうでない場合は処理を終了する。次に、ステップＳ７０２において、頭部の位置の推定結果の中で座標Ｘ_hB,iが示す領域との重複率が最も高いものを選択する。このとき、選択した推定結果の番号をｊ（ｊ＝１，・・・，Ｍ）とし、その番号の頭部の推定座標をＸ_hH,jとする。ここで座標Ｘ_hB,iが示す領域と座標Ｘ_hH,jが示す領域との重複率Ａ₀（Ｘ_hB,i，Ｘ_hH,j）は、以下の式（４）より求める。 First, in step S701, it is determined whether or not the detection result of the whole body detector remains. As a result, when it remains, it progresses to step S702, and when that is not right, a process is complete | finished. Next, in step S702, the head position estimation result having the highest overlap rate with the region indicated by the coordinates X _{hB, i} is selected. At this time, the number of the selected estimation result is j (j = 1,..., M), and the estimated coordinates of the head of that number are X _{hH, j} . Here, the overlapping rate A ₀ (X _{hB, i} , X _{hH, j} ) between the area indicated by the coordinates X _{hB, i} and the area indicated by the coordinates X _{hH, j} is obtained from the following equation (4).

ここで、Ｐ（Ｘ，Ｙ）は矩形Ｘと矩形Ｙとで重複した領域の面積である。また、Ｓ（Ｘ）、Ｓ（Ｙ）は、それぞれ矩形Ｘ、矩形Ｙの面積である。 Here, P (X, Y) is the area of the overlapping area between the rectangle X and the rectangle Y. S (X) and S (Y) are the areas of the rectangle X and the rectangle Y, respectively.

次に、ステップＳ７０３において、顔の位置の推定結果の中で、座標Ｘ_hB,iが示す領域との重複率が最も高いものを選択する。このとき、選択した推定結果の番号をｋ（ｋ＝１，・・・，Ｎ）とする。そして、ステップＳ７０４において、各全身検出器の検出結果ｉについて、以下の式（５）に示すベクトルＲ_iを出力する。 In step S703, the face position estimation result having the highest overlap rate with the area indicated by the coordinates X _{hB, i} is selected. At this time, the number of the selected estimation result is k (k = 1,..., N). In step S704, the vector R _i shown in the following equation (5) is output for the detection result i of each whole body detector.

ここで、Ｓ_B,i、Ｓ_H,j、Ｓ_F,kはそれぞれ、ｉ番目の全身検出器の補正スコア、ｊ番目の頭部検出器の補正スコア、ｋ番目の顔検出器の補正スコアであり、それらの和が統合スコアとして統合結果出力部２５０から出力される。本実施形態では、各検出器のスコアを補正して単純な和をとって統合スコアとする。なお、各検出器の種類によっては補正が必要のない場合があり、このことは検出精度を比較することによって要否を判断できる。また、スコアを補正する場合でもしない場合でも各検出器のスコアの線形和を取って統合スコアとすることもできる。この場合の線形係数は、各検出器のスコアを入力ベクトルとしたＳＶＭなどの学習によって得ることができる。 Here, S _{B, i} , S _{H, j} and S _{F, k} are the correction score of the i-th whole body detector, the correction score of the j-th head detector, and the correction score of the k-th face detector, respectively. These sums are output from the integration result output unit 250 as integrated scores. In this embodiment, the score of each detector is corrected and a simple sum is taken as an integrated score. Note that correction may not be necessary depending on the type of each detector, and this can be determined by comparing the detection accuracy. Further, whether or not the score is corrected, a linear sum of the scores of the respective detectors can be taken to obtain an integrated score. The linear coefficient in this case can be obtained by learning such as SVM using the score of each detector as an input vector.

図８は、統合結果出力部２５０から出力される処理結果の具体例を説明する図である。図８（Ａ）は、統合結果出力部２５０に入力された時点での検出結果を示しており、人物周辺に複数の検出結果が得られている状態である。なお、図８（Ａ）に示す例では、説明を簡略化するために顔検出器の結果については省略しており、頭部検出器の検出結果及び全身検出器の検出結果のみを図示している。 FIG. 8 is a diagram for explaining a specific example of the processing result output from the integration result output unit 250. FIG. 8A shows a detection result at the time when it is input to the integrated result output unit 250, and shows a state in which a plurality of detection results are obtained around the person. In the example shown in FIG. 8 (A), the result of the face detector is omitted for the sake of simplicity, and only the detection result of the head detector and the detection result of the whole body detector are illustrated. Yes.

波線の矩形８０１は全身検出器の検出結果であり、波線の矩形８０４は全身検出器から推定された頭部位置の領域である。図８（Ａ）に示す例では、１つの全身検出器の検出結果と、その頭部の推定結果とを示している。また、実線の矩形８０２、８０３は頭部検出器の２つの検出結果を示している。これらは、頭部を検出する処理で画像中の探索位置を変えながら検出処理を行った結果、人物の頭部の周辺に複数の検出結果が得られた結果である。統合結果出力部２５０では、共通部位である頭部位置と推定情報を用いて、これらの検出結果をまとめる。 A wavy line rectangle 801 is a detection result of the whole body detector, and a wavy line rectangle 804 is an area of the head position estimated from the whole body detector. In the example shown in FIG. 8A, the detection result of one whole body detector and the estimation result of the head are shown. Solid rectangles 802 and 803 indicate two detection results of the head detector. These are results obtained by performing a detection process while changing the search position in the image in the process of detecting the head, resulting in a plurality of detection results obtained around the person's head. The integrated result output unit 250 collects these detection results using the head position and the estimation information that are common parts.

図８（Ｂ）は、図８（Ａ）に示す検出結果を統合結果出力部２５０で処理した結果を示しており、全身検出器に基づいた矩形８０４に示す頭部の推定位置と最も重複度の高い矩形８０２に示す頭部の検出結果が選択され、統合結果として残されている。逆に、頭部検出器の誤検出と思われる矩形８０３に示す頭部の検出結果は、対応する全身検出の結果がないため、削除されている。 FIG. 8B shows the result of processing the detection result shown in FIG. 8A by the integrated result output unit 250. The head position shown in the rectangle 804 based on the whole body detector and the degree of overlap are the most. A head detection result indicated by a high rectangle 802 is selected and left as an integration result. Conversely, the detection result of the head indicated by the rectangle 803, which is considered to be a false detection of the head detector, is deleted because there is no corresponding whole body detection result.

＜遮蔽判定処理＞
遮蔽判定部１０２は、物体検出部１０１において統合された結果を用いて、検出対象に遮蔽が存在するか否かを判定し、遮蔽状態に応じて検出スコアを補正する。以下、物体検出部１０１の出力結果である統合スコアを検出スコアと記載して説明する。具体的には、遮蔽判定部１０２は、検出スコアの高い頭部の検出結果に対応する全身推定結果と、その周辺に存在する頭部の検出結果とを順次参照して遮蔽を判定する。なお、本実施形態では、検出する人体は常に直立した状態であることを想定しているものとする。 <Shielding judgment process>
The shielding determination unit 102 determines whether there is shielding in the detection target using the result integrated in the object detection unit 101, and corrects the detection score according to the shielding state. Hereinafter, an integrated score that is an output result of the object detection unit 101 will be described as a detection score. Specifically, the shielding determination unit 102 determines shielding by sequentially referring to the whole body estimation result corresponding to the detection result of the head having a high detection score and the detection result of the head existing in the vicinity thereof. In the present embodiment, it is assumed that the human body to be detected is always upright.

図９は、第１の物体である前景となる人体９００により第２の物体である後景の人体９０１が部分的に遮蔽されている様子を説明する図である。この時、物体検出部１０１の処理結果として、実線の矩形９１０、９１１に示す頭部の検出結果が得られており、波線の矩形９２０、９２１は、全身の検出結果を示している。 FIG. 9 is a diagram for explaining a situation in which the foreground human body 900 that is the first object is partially shielded by the rear body human body 901 that is the second object. At this time, the detection results of the head shown by solid line rectangles 910 and 911 are obtained as processing results of the object detection unit 101, and the wavy line rectangles 920 and 921 indicate the detection results of the whole body.

図１０は、図９における遮蔽状態の判定方法と遮蔽領域の計算方法とを説明する図である。まず、遮蔽状態を判定するために、前景の人体９００の頭部の検出結果を表す矩形９１０に着目する。ここで、カメラで撮影した被写体は、透視投影の影響により、手前の物ほど大きく映り、奥にある物ほど小さく映る。そして、手前の物体ほど遮蔽が生じる可能性は小さい。この原理を踏まえ、物体検出部１０１から出力された全ての検出結果を検出スコアと検出枠のサイズとで大きい順にソートする。この処理を施すことにより、遮蔽が生じている可能性が低く、かつ遮蔽の原因になる手前の人体を特定することができる。そして、遮蔽の原因になりそうな人体から順に、他の検出結果との遮蔽判定を行う。ただし、近接する人体の場合は、必ずしも透視投影による顕著なサイズの違いは生じない。このような場合は、テクスチャの連続性などを考慮してどちらの人体が前景に存在する人体かを判定するなどしてもよい。 FIG. 10 is a diagram for explaining the shielding state determination method and the shielding region calculation method in FIG. 9. First, in order to determine the shielding state, attention is paid to a rectangle 910 that represents the detection result of the head of the human body 900 in the foreground. Here, the subject photographed by the camera appears larger as the object in the foreground and appears smaller as the object in the back due to the influence of perspective projection. And the possibility that the front object is shielded is small. Based on this principle, all detection results output from the object detection unit 101 are sorted in descending order by detection score and detection frame size. By performing this process, it is possible to identify a human body in front of which the possibility of shielding is low and which causes the shielding. Then, shielding determination with other detection results is performed in order from the human body that is likely to cause shielding. However, in the case of a nearby human body, a significant difference in size due to perspective projection does not necessarily occur. In such a case, it is possible to determine which human body exists in the foreground in consideration of the continuity of the texture.

そして、人体９００の頭部の検出結果を表す矩形９１０と人体９０１の全身の検出結果を表す矩形９２１との交点、もしくは重複領域１００２を算出する。ここで、一般的に最も前景と想定される人体の頭部と他の人体の全身の検出結果とが重複していた場合で、かつその全身の検出結果に対応する頭部の検出スコアが閾値よりも高い場合、その人体は高い確率で遮蔽状態にあると判定できる。そして、交差もしくは重複する全身の検出結果に対応する頭部の検出スコアの最も低いものから順に人体の一部が遮蔽状態にあると判定して、検出スコアを補正する。 Then, an intersection point or an overlap area 1002 between the rectangle 910 representing the detection result of the head of the human body 900 and the rectangle 921 representing the whole body detection result of the human body 901 is calculated. Here, when the head of the human body generally assumed to be the most foreground and the detection result of the whole body of another human body overlap, the detection score of the head corresponding to the detection result of the whole body is a threshold value. Higher than that, it can be determined that the human body is in a shielded state with a high probability. Then, the detection score is corrected by determining that a part of the human body is in the shielding state in order from the lowest head detection score corresponding to the cross-over or overlapping whole-body detection result.

以下、遮蔽状態を判定して検出結果を補正する処理について説明する。
図１２は、遮蔽判定部１０２による処理手順の一例を示すフローチャートである。図１２に示す処理では、ステップＳ１２０１〜ステップＳ１２０７の処理を物体検出部１０１で検出された個々の出力結果に対してループしながら行う。ここで、注目している全身検出器の出力結果の番号をｉ（ｉ＝１，・・・，Ｌ）とする。 Hereinafter, processing for determining the shielding state and correcting the detection result will be described.
FIG. 12 is a flowchart illustrating an example of a processing procedure performed by the shielding determination unit 102. In the process illustrated in FIG. 12, the processes in steps S <b> 1201 to S <b> 1207 are performed while looping on individual output results detected by the object detection unit 101. Here, the number of the output result of the focused whole body detector is i (i = 1,..., L).

まず、ステップＳ１２０１において、検出スコアの高い順であって、かつ頭部のサイズの大きい順に検出結果をソートする。この処理は、前述したように検出スコアが高く、かつ頭部のサイズが大きい人体ほど前景に存在する可能性が高いという前提知識に基づくものである。 First, in step S1201, the detection results are sorted in descending order of detection score and in descending order of head size. As described above, this processing is based on the premise that a human body having a higher detection score and a larger head size is more likely to exist in the foreground.

また、カメラから奥側に物体が存在する場合、もしくは子供の頭が検出された場合などは、検出スコアは高いが頭部のサイズは小さくなる傾向にある。この場合、検出スコアの高い順にソートした後、構築するシステムを実際に稼働させる状況で検出サイズに一定の閾値などを設けて、検出され得る最小の頭部のサイズを決定しておく。これにより、検出候補の尤度を信頼性の高いものにできる。 Further, when an object is present on the back side from the camera or when a child's head is detected, the detection score is high, but the size of the head tends to be small. In this case, after sorting in descending order of detection score, a certain threshold value is provided for the detection size in a situation where the system to be constructed is actually operated, and the minimum head size that can be detected is determined. Thereby, the likelihood of a detection candidate can be made highly reliable.

さらに、被写体が柄の長い帽子を被っていた場合などは、検出スコアが低く頭部のサイズが大きくなることがある。このような場合には、第１の検出処理部２１１〜第ｎの検出処理部２１ｎに、柄の長い帽子を被っている頭部に対して高い検出結果を出力する検出器を備えることにより、検出スコアの補正を行って再度ソートすることが望ましい。 Furthermore, when the subject is wearing a hat with a long handle, the detection score may be low and the size of the head may be large. In such a case, by providing the first detection processing unit 211 to the n-th detection processing unit 21n with a detector that outputs a high detection result for the head wearing a long hat, It is desirable to correct the detection score and sort again.

次に、ステップＳ１２０２において、前景と推定される人体の全身及び頭部の検出結果を参照し、前景と推定される人体を選択する。以下、前景と推定される人体の全身の検出結果をＡ_Bとし、その人体の頭部の検出結果をＡ_Hとする。この処理では、ステップＳ１２０１でソートが完了しているので、検出スコアの高いものから選択することとなる。 Next, in step S1202, the human body estimated as the foreground is selected with reference to the detection results of the whole body and head of the human body estimated as the foreground. Hereinafter, the detection result of the whole human body estimated as the foreground is A _B, and the detection result of the human head is A _H. In this process, since the sorting is completed in step S1201, the one with the highest detection score is selected.

次に、ステップＳ１２０３において、後景の人体の全身及び頭部の検出結果を参照し、ステップＳ１２０２で選択した人体の検出スコアよりも低く、かつ頭部の検出結果に対応する全身の検出結果が算出されている人体を選択する。以下、この処理で選択される後景の人体の全身の検出結果をＢ_Bとし、その人体の頭部の検出結果をＢ_Hとする。 Next, in step S1203, the detection result of the whole body and head of the human body in the background is referred to, and the detection result of the whole body that is lower than the detection score of the human body selected in step S1202 and corresponds to the detection result of the head is obtained. Select the calculated human body. Hereinafter, the detection result of the human body systemic rear ground to be selected in this process and B _B, the detection result of the human head and B _H.

次に、ステップＳ１２０４においては、ステップＳ１２０２で選択した前景の人体の頭部の検出結果（Ａ_H）と、後景の人体の全身の検出結果（Ｂ_B）とで交点が２点以上存在するか、もしくは重複領域（Ａ_H∩Ｂ_B）が存在するか否かを判定する。この判定の結果、交点もしくは重複領域が存在しない場合はステップＳ１２０７に進み、存在する場合はステップＳ１２０５に進む。 Next, in step S1204, there are two or more intersections between the detection result (A _H ) of the head of the human body in the foreground selected in step S1202 and the detection result (B _B ) of the whole body of the human body in the background. Or whether there is an overlapping area (A _H ∩B _B ). As a result of the determination, if there is no intersection or overlapping region, the process proceeds to step S1207, and if it exists, the process proceeds to step S1205.

ステップＳ１２０５においては、前景の人体の全身の検出結果（Ａ_B）を利用して、後景の人体における遮蔽領域（Ａ_B∩Ｂ_B）を算出する。図１０に示す例の場合、各検出結果の左上及び右下の座標値を利用して遮蔽領域１００１を算出する。まず、人体９０１の全身の検出結果の左上点を（ｘ_j，ｙ_j）として、同様に人体９００の全身の検出結果の左上点を（ｘ_i，ｙ_i）とすると、幅ｗ_Oおよび高さｈ_O（ｗ_O,i，ｈ_O,i）は、以下の式（６）で表される。 In step S1205, the foreground human body detection result (A _B ) is used to calculate the shielding area (A _B ∩B _B ) in the foreground human body. In the case of the example illustrated in FIG. 10, the shielding region 1001 is calculated using the upper left and lower right coordinate values of each detection result. First, assuming that the upper left point of the whole body detection result of the human body 901 is (x _j , y _j ) and the upper left point of the whole body detection result of the human body 900 is (x _i , y _i ), the width w _O and the high H _O (w _{O, i} , h _{O, i} ) is expressed by the following equation (6).

そして、人体９０１の全身を検出する際に、幅ｗ_Bおよび高さｈ_Bは既に算出されているため、遮蔽領域１００１の面積Ｓ_Oは、以下の式（７）により求めることができる。 When the whole body of the human body 901 is detected, the width w _B and the height h _B have already been calculated. Therefore, the area S _O of the shielding region 1001 can be obtained by the following equation (7).

次に、ステップＳ１２０６において、ステップＳ１２０５で算出した遮蔽領域の面積から検出スコアの補正値を決定する。本実施形態で示した例のように、検出する人体が直立していることを仮定した場合は、遮蔽領域が大きければ大きいほど検出スコアは低下するため、遮蔽領域の面積と検出スコアとの間には負の相関があると考えることができる。したがって、遮蔽が存在する場合は、検出スコアに対して以下の式（８）により補正を施す。 Next, in step S1206, a correction value for the detection score is determined from the area of the shielding region calculated in step S1205. As in the example shown in the present embodiment, assuming that the human body to be detected is upright, the detection score decreases as the shielding area increases, so the area between the area of the shielding area and the detection score. Can be considered to have a negative correlation. Therefore, when there is shielding, the detection score is corrected by the following equation (8).

ここで、δ（Ｓ_O,i）は、値域が０から１の単調増加関数である。仮に遮蔽が無ければ補正係数は１となり、遮蔽の面積が大きくなればなるほど、大きな係数がかかるように設計されている。このような補正をかけることによって、遮蔽のために生じた検出スコアの低下を取り消すことができる。 Here, δ (S _{O, i} ) is a monotonically increasing function with a range of 0 to 1. If there is no shielding, the correction coefficient is 1, and the larger the shielding area is, the larger the coefficient is designed. By applying such correction, it is possible to cancel the decrease in detection score caused by shielding.

また、前述した頭部の位置を推定する際の正解確率の求め方と同様に、遮蔽が生じるパターン毎に、遮蔽パターンと各検出器の検出スコアとの相関関係を求めておくことにより、遮蔽時の補正係数をスコア補正辞書として記録することも可能である。 Similarly to the above-described method for obtaining the correct probability when estimating the position of the head, for each pattern in which shielding occurs, by obtaining the correlation between the shielding pattern and the detection score of each detector, shielding is performed. It is also possible to record the time correction coefficient as a score correction dictionary.

遮蔽時の補正係数を予め求めておくためには、まず、頭部位置と全身位置が既知の画像サンプル群を準備する。なお、準備する画像は、人体が遮蔽されている状態の画像と遮蔽されていない状態の画像とをそれぞれ準備することが望ましい。さらに、遮蔽されている画像とそうでない画像とで、人体の姿勢およびサイズが同じ状態で写っていることも望まれる。これは、遮蔽状態の有無により検出スコアにどのような変化が生じるかを計測するためである。 In order to obtain the correction coefficient at the time of occlusion in advance, first, an image sample group whose head position and whole body position are known is prepared. Note that it is desirable to prepare images to be prepared for an image in a state where the human body is shielded and an image in a state where the human body is not shielded. Furthermore, it is also desired that the image of the human body and the image that is not so appear in the same posture and size of the human body. This is to measure what kind of change occurs in the detection score depending on the presence or absence of the shielding state.

まず、物体検出部１０１により、遮蔽状態がない人体の検出スコアＳ_C,iを算出する。次に、遮蔽パターン毎に人体の検出スコアＳ_B,iを算出する。そして、遮蔽パターン毎に遮蔽面積と検出スコアとの相関を最小二乗法などにより計算する。遮蔽面積と検出スコアとの相関は、遮蔽パターン毎に異なるため、起こりうる遮蔽状態のパターンを予め想定して遮蔽パターン数のスコア補正辞書を保持しておくのが望ましい。 First, the object detection unit 101 calculates a detection score S _{C, i} of a human body without a shielding state. Next, a human body detection score S _{B, i} is calculated for each shielding pattern. Then, the correlation between the shielding area and the detection score is calculated for each shielding pattern by the least square method or the like. Since the correlation between the shielding area and the detection score is different for each shielding pattern, it is desirable to store a score correction dictionary for the number of shielding patterns assuming a possible shielding state pattern in advance.

最後に、遮蔽の状態に応じて予め算出しておいた遮蔽パターンｐにおける補正係数Ｓ_O,pを検出スコアＳ_B,iに乗算することにより、以下の式（９）に従って検出スコアを計算する。本実施形態では、人体の頭部や顔以外の全身領域において遮蔽が生じることを想定しているため、全身検出器から出力される検出スコアＳ_B,iにのみ乗算した。一方、構築するシステムによっては、頭部検出器から出力される検出スコアＳ_F,kや顔検出器から出力される補正スコアＳ_H,jに、遮蔽状態に応じた補正スコアを乗算してもよい。 Finally, the detection score is calculated according to the following equation (9) by multiplying the detection score S _{B, i} by the correction coefficient S _{O, p} in the shielding pattern p calculated in advance according to the shielding state. . In this embodiment, since it is assumed that shielding occurs in the whole body region other than the head and face of the human body, only the detection score S _{B, i} output from the whole body detector is multiplied. On the other hand, depending on the system to be constructed, the detection score S _{F, k} output from the head detector or the correction score S _{H, j} output from the face detector may be multiplied by the correction score corresponding to the shielding state. Good.

以上のように、遮蔽パターン毎にスコア補正辞書を用意し、検出スコアの補正をすることにより、遮蔽状態であっても精度良く人体検出を行うことができる。 As described above, by preparing a score correction dictionary for each shielding pattern and correcting the detection score, it is possible to accurately detect a human body even in a shielding state.

次に、ステップＳ１２０７において、検出スコアが閾値以上のすべての検出結果に対して遮蔽判定を行ったか否かを判定する。この判定の結果、まだ遮蔽判定を行っていない検出結果が残っている場合は、ステップＳ１２０２に戻り、処理を繰り返す。処理を繰り返す場合は、ステップＳ１２０２に戻って前景となる検出結果を選択し直し、検出スコアが閾値未満の検出結果しかなくなった場合は処理を終了する。 Next, in step S1207, it is determined whether or not shielding determination has been performed for all detection results having a detection score equal to or greater than a threshold value. As a result of this determination, if there remains a detection result that has not yet been subjected to shielding determination, the process returns to step S1202 to repeat the process. When the process is repeated, the process returns to step S1202 to reselect the detection result as the foreground, and when there is only a detection result whose detection score is less than the threshold, the process ends.

図１１は、２人以上の人物が重なって遮蔽が生じている状態の一例を示す図である。図１１に示す例では、一番手前の人体１１００により後方の人体１１０１に遮蔽が生じており、さらに後方の人体１１０２は、２つの人体１１００、１１０１に遮蔽されている。遮蔽領域１１０３は、２つの人体１１００、１１０１の影響により人体１１０２に生じる遮蔽領域を示している。 FIG. 11 is a diagram illustrating an example of a state in which two or more persons overlap and a shield is generated. In the example shown in FIG. 11, the front human body 1100 shields the rear human body 1101, and the rear human body 1102 is shielded by the two human bodies 1100 and 1101. A shield area 1103 indicates a shield area generated in the human body 1102 due to the influence of the two human bodies 1100 and 1101.

前述したように、透視投影の原理を踏まえ、ステップＳ１２０１では、物体検出部１０１で出力された全ての検出結果を検出スコアと検出枠のサイズとで大きい順にソートする。この処理により、遮蔽が生じている可能性が低く、かつ遮蔽の原因になる手前の人体１１００を特定することができる。そして、遮蔽の原因になりそうな人体から順に、他の検出結果との遮蔽判定を行う。 As described above, based on the principle of perspective projection, in step S1201, all detection results output by the object detection unit 101 are sorted in descending order of detection score and detection frame size. By this processing, it is possible to identify the human body 1100 in the near side that is unlikely to be shielded and causes the shielding. Then, shielding determination with other detection results is performed in order from the human body that is likely to cause shielding.

また、人体１１０２は人体１１０１の近隣にあり、さらに頭部検出器は人体１１０２の頭部に対して高い検出スコアを出力する。そのため、人体１１０２の頭部の検出スコアが人体１１０１の頭部の検出スコアよりも大きい場合には、人体１１０２の頭部と人体１１０１の全身とが融合したような領域が、人体１１０１の全身の検出結果として出力される。また、人体１１０２は人体１１０１の一部として検出されるため、正しく検出されない。このような誤検出が生じた場合には、本実施形態で説明した遮蔽状態判定方法を用い、人体１１０２に遮蔽状態が生じていると判定する。そして、人体１１０２の頭部のサイズのスケールに合う全身矩形を選択し、かつ人体１１０１の頭部とその全身矩形を検出することによって誤検出を防ぐようにする。 The human body 1102 is in the vicinity of the human body 1101, and the head detector outputs a high detection score for the head of the human body 1102. Therefore, when the detection score of the head of the human body 1102 is larger than the detection score of the head of the human body 1101, an area where the head of the human body 1102 and the whole body of the human body 1101 are fused Output as detection result. Further, since the human body 1102 is detected as a part of the human body 1101, it is not correctly detected. When such a false detection occurs, it is determined that the shielding state is generated in the human body 1102 using the shielding state determination method described in the present embodiment. The whole body rectangle that matches the scale of the size of the head of the human body 1102 is selected, and erroneous detection is prevented by detecting the head of the human body 1101 and the whole body rectangle.

以上のように本実施形態によれば、従来の手法に比べ、検出対象物に遮蔽が存在した場合でも検出スコアを落とさずに検出することができるようになる。また、包含関係により求めた遮蔽領域の面積や遮蔽パターン毎に検出スコアを補正しているので、最終的に出力する検出結果は、従来よりも精度が向上する。さらに、物体検出過程で算出される頭部検出器と全身検出器との出力結果をそのまま利用し、高度で複雑な計算をせずに検出スコアを補正することができる。 As described above, according to the present embodiment, it is possible to detect without lowering the detection score even when the detection target is shielded as compared with the conventional method. In addition, since the detection score is corrected for each area and shielding pattern of the shielding region obtained from the inclusion relation, the accuracy of the detection result to be finally output is improved as compared with the conventional case. Furthermore, it is possible to correct the detection score without performing sophisticated and complicated calculations by directly using the output results of the head detector and the whole body detector calculated in the object detection process.

（第２の実施形態）
本実施形態では、対象物を複数の移動可能なパーツに分割して検出する検出処理部を用いた場合の遮蔽状態の判定方法、並びに検出スコアの補正方法について説明する。本実施形態においても、検出対象物は人物とし、共通部位は人物の頭部とした例について説明する。ただし、第１の実施形態では、人物が常に直立した状態であることを想定していたが、本実施形態では、前屈みになった前傾姿勢や、しゃがみといった姿勢変化にも対応できる。また、第１の実施形態で説明した内容と同じ構成及び処理については説明を省略する。 (Second Embodiment)
In the present embodiment, a shielding state determination method and a detection score correction method when a detection processing unit that detects an object by dividing it into a plurality of movable parts will be described. Also in this embodiment, an example will be described in which the detection target is a person and the common part is a person's head. However, in the first embodiment, it is assumed that the person is always in an upright state. However, in this embodiment, it is possible to cope with a forward tilted posture that is bent forward and a posture change such as squatting. The description of the same configuration and processing as those described in the first embodiment is omitted.

なお、本実施形態に係る物体検出装置１０の全体構成は、基本的には第１の実施形態で説明した図１及び図２と同様である。但し、第１の検出処理部２１１〜第ｎの検出処理部２１ｎの検出対象が異なっており、統合結果出力部２５０の処理内容も異なっている。また、本実施形態で使用する検出器としては、頭部検出器と全身検出器とを使用する例について説明する。対象物の小さな姿勢変化などに対応した検出を行うために、例えば、非特許文献６に記載されているようなパーツベースの検出手法が知られている。 The overall configuration of the object detection apparatus 10 according to the present embodiment is basically the same as that illustrated in FIGS. 1 and 2 described in the first embodiment. However, the detection targets of the first detection processing unit 211 to the nth detection processing unit 21n are different, and the processing content of the integrated result output unit 250 is also different. In addition, as a detector used in the present embodiment, an example in which a head detector and a whole body detector are used will be described. In order to perform detection corresponding to a small change in posture of an object, for example, a parts-based detection method as described in Non-Patent Document 6 is known.

図１３は、パーツベースの検出手法を用いた全身検出器を説明する図である。図１３の点線の矩形１３０２は、全身検出器の一つのパーツであり、図１３に示す例では、全身検出器は８個のパーツで構成されている。また、実線の矩形１３０１は、パーツベースの検出の結果得られる全身の検出結果である。 FIG. 13 is a diagram for explaining a whole-body detector using a parts-based detection method. A dotted rectangle 1302 in FIG. 13 is one part of the whole body detector, and in the example shown in FIG. 13, the whole body detector is composed of eight parts. A solid line rectangle 1301 is a whole body detection result obtained as a result of parts-based detection.

図１３の（Ａ）に示す例と図１３（Ｂ）に示す例とでは、人物の姿勢が異なるため、検出の結果得られる各パーツの位置もそれぞれ異なる。パーツベースの検出結果では、各パーツの検出スコアと位置関係とに基づいて算出される全体としての検出スコアが得られるとともに、図１３に示す実線や破線で表される対象物や各パーツの位置・範囲の情報が得られる。 In the example shown in FIG. 13A and the example shown in FIG. 13B, since the posture of the person is different, the positions of the parts obtained as a result of the detection are also different. In the part-based detection result, an overall detection score calculated based on the detection score and positional relationship of each part is obtained, and the position of the object and each part represented by a solid line or a broken line shown in FIG. -Range information can be obtained.

以下、このようなパーツベースの検出器を用いた場合に、検出結果から頭部の位置を推定（共通部位を推定）する例について説明する。まず、パーツベースの検出器の結果から、第１の共通部位推定部２２１〜第ｎの共通部位推定部２２ｎで頭部の位置を推定する処理について説明する。簡単な場合としては、頭部を検出対象とするパーツが含まれる場合には、その頭部のパーツ位置を頭部の位置の推定結果とすればよい。また、頭部のパーツが推定する頭部範囲と一致しない場合（例えば、頭部から肩までを検出対象としたパーツがある場合）は、第１の実施形態で説明したように、検出結果の頭部パーツから頭部位置を推定すればよい。 Hereinafter, an example of estimating the position of the head from the detection result (estimating the common part) when such a parts-based detector is used will be described. First, the process of estimating the position of the head by the first common part estimation unit 221 to the nth common part estimation unit 22n from the result of the parts-based detector will be described. As a simple case, when a part whose detection target is the head is included, the part position of the head may be used as the estimation result of the head position. Further, when the head part does not match the estimated head range (for example, when there is a part targeted for detection from the head to the shoulder), as described in the first embodiment, the detection result The head position may be estimated from the head part.

一方、図１３に示すように、頭部を明確に示さないパーツ群で検出器が構成されている場合には、複数のパーツの位置情報を用いて頭部の位置を推定することができる。複数のパーツの位置情報から頭部の位置を推定する場合には、各パーツの座標情報を並べたベクトルから、推定する頭部の位置を線形変換により求める。８個のパーツから頭部位置の左上のｘ座標ｘ_h1を推定する線形変換式としては、例えば以下の式（１０）に示すものを用いて頭部位置を推定する。 On the other hand, as shown in FIG. 13, when the detector is composed of a group of parts that do not clearly indicate the head, the position of the head can be estimated using the position information of a plurality of parts. When estimating the position of the head from the position information of a plurality of parts, the position of the estimated head is obtained by linear transformation from a vector in which the coordinate information of each part is arranged. As a linear transformation formula for estimating the upper left x coordinate x _h1 of the head position from the eight parts, the head position is estimated using, for example, a formula shown in the following formula (10).

ここで、Ｘ_pはパーツ座標のベクトルに定数１を追加したものであり、Ｂ_h1は変換係数ベクトルである。また、ｘ_pn、ｙ_pnはそれぞれｎ番目のパーツの中心座標であり、ｂはｘ_h1座標を求めるためのそれぞれの項の変換係数であり、定数項ｂ₀を含む。また、ｗ，ｈはそれぞれ対象物領域（図１３に示す実線の矩形１３０１）の幅、高さである。なお、頭部の推定位置Ｘ_hを求めるためには、異なる変換係数を用いて同様にｙ_h1、ｘ_h2、ｙ_h2を求めればよい。 Here, X _p is obtained by adding a constant 1 to a part coordinate vector, and B _h1 is a transformation coefficient vector. Further, x _pn and y _pn are center coordinates of the n-th part, b is a conversion coefficient of each term for _obtaining x _h1 coordinates, and includes a constant term b ₀ . W and h are the width and height of the object area (solid line rectangle 1301 shown in FIG. 13), respectively. In order to obtain the estimated position X _{h of the} head, y _h1 , x _h2 , and y _h2 may be similarly obtained using different conversion coefficients.

前述した例では、各パーツの中心座標のみから頭部の位置を推定しているが、検出の結果得られる対象物領域（図１３に示す実線の矩形１３０１）の座標情報をパーツ座標ベクトルに加えてもよい。変換係数ベクトルＢは、頭部の正解基準を与えた画像サンプル群と、その画像サンプル群へのパーツベース検出器の検出結果から最小二乗法により求めることができる。頭部位置を推定する方法は、最小二乗法に限定することなく、頭部位置を目的変数、複数のパーツ位置を説明変数として他の回帰分析で求めることもできる。 In the example described above, the position of the head is estimated only from the center coordinates of each part. However, the coordinate information of the object area (solid rectangle 1301 shown in FIG. 13) obtained as a result of detection is added to the part coordinate vector. May be. The conversion coefficient vector B can be obtained by the least square method from the image sample group giving the correct answer standard of the head and the detection result of the parts-based detector to the image sample group. The method for estimating the head position is not limited to the method of least squares, and can be obtained by other regression analysis using the head position as an objective variable and a plurality of parts positions as explanatory variables.

また、式（１０）の変換係数ベクトルＢを計算する際に、頭部の正解基準を与える代わりに、人体の体軸（喉元から腰の中心を結んだ直線）を正解基準とする画像サンプル群を与えることにより、複数のパーツ群から体軸を推定することが可能になる。例えば、図１４に示すように、人体を直立した状態から少し前傾にした際の姿勢や、更に屈んだしゃがみの姿勢なども検出することが可能である。この場合、体軸１４０１（喉元と腰の中心とを結ぶ直線）の各点の座標セットＸ_Uは、以下の式（１１）により求めることができる。 In addition, when calculating the conversion coefficient vector B of the equation (10), a group of image samples using the body axis of the human body (a straight line connecting the center of the waist from the throat) as a correct reference instead of giving the correct reference of the head. It is possible to estimate the body axis from a plurality of parts groups. For example, as shown in FIG. 14, it is possible to detect a posture when the human body is tilted slightly forward from an upright state, a posture of crouching further, and the like. In this case, the coordinate set X _U of each point of the body axis 1401 (straight line connecting the center of throat and waist) can be obtained by the following equation (11).

体軸１４０１を推定する際には、図１４（Ａ）の波線の矩形１４０２に示す８個のパーツから頭部の位置を推定する式（１０）に示した線形変換式を用いる。頭部の正解基準の代わりに人体の体軸１４０１（喉元から腰の中心まで）を正解基準として与えた画像サンプル群と、その画像サンプル群へのパーツベース検出器の検出結果とから、頭部の推定と同様に最小二乗法により体軸１４０１を推定することが可能である。体軸を推定する方法は、前述と同様に最小二乗法に限定するものではなく、構築するシステムにおいて好適な手段を用いればよい。更に、頭部や体軸以外にも頭部と体軸とを含む上半身矩形を同様の処理で算出することも可能である。 When estimating the body axis 1401, the linear transformation formula shown in Formula (10) for estimating the position of the head from the eight parts indicated by the wavy rectangle 1402 in FIG. From the image sample group given the human body axis 1401 (from the throat to the center of the waist) as the correct reference instead of the correct answer reference of the head, and the detection result of the parts-based detector on the image sample group, Similar to the estimation of the body axis 1401, the body axis 1401 can be estimated by the method of least squares. The method for estimating the body axis is not limited to the method of least squares as described above, and any means suitable for the system to be constructed may be used. In addition to the head and body axis, an upper body rectangle including the head and body axis can be calculated by the same process.

ここで、８個のパーツ群のそれぞれの検出器は、図１４（Ｂ）の波線の矩形１４０４に示すように、人体の傾きによってその位置は大きく変動する。そのため、体軸１４０３を精度良く算出するには、人体の様々な傾き（姿勢）毎にカテゴリ分けして、変換係数ベクトルＢを計算しておくとよい。ただし、姿勢のカテゴリを多くすると、姿勢の識別精度は高まる一方で、計算量も増加するため、姿勢のカテゴリ数は計算量とのバランスに応じて設定する。したがって、構築するシステムによって識別したい姿勢のカテゴリ数を決める必要がある。 Here, the positions of the detectors of the eight parts group greatly vary depending on the inclination of the human body, as indicated by a wavy rectangle 1404 in FIG. Therefore, in order to calculate the body axis 1403 with high accuracy, it is preferable to calculate the conversion coefficient vector B by categorizing the human body according to various inclinations (postures). However, if the number of posture categories is increased, the posture identification accuracy increases, but the amount of calculation also increases. Therefore, the number of posture categories is set according to the balance with the amount of calculation. Therefore, it is necessary to determine the number of posture categories to be identified by the system to be constructed.

＜遮蔽判定方法とスコア補正方法＞
次に、本実施形態における遮蔽判定方法と検出スコアの補正方法とについて説明する。
図１５は、本実施形態における複数のパーツ検出器による頭部の推定結果と体軸の推定結果とから、遮蔽判定を行う事例を説明する図である。人体１５０１、１５０９は、同一の姿勢で同じ人体である。図１５（Ａ）に示す人体１５０１は、前景の人体１５００により部分的に遮蔽されており、図１５（Ｂ）に示す人体１５０９は、前屈みになった前景の人体１５０８により部分的に遮蔽されている。 <Shielding judgment method and score correction method>
Next, a shielding determination method and a detection score correction method according to this embodiment will be described.
FIG. 15 is a diagram for explaining an example in which the occlusion determination is performed based on the estimation result of the head and the estimation result of the body axis by the plurality of part detectors in the present embodiment. Human bodies 1501 and 1509 are the same human body with the same posture. A human body 1501 shown in FIG. 15A is partially shielded by a foreground human body 1500, and a human body 1509 shown in FIG. 15B is partially shielded by a foreground human body 1508 bent forward. Yes.

まず、図１５（Ａ）において、第１の実施形態で説明した方法により、人体の頭部検出枠（矩形１５０２、１５０５）、及び全身検出枠（矩形１５０３、１５０６）を算出する。そして、本実施形態で前述した方法により、体軸１５０４、１５０７を算出する。ここで、人体１５０１は部分的に遮蔽されているため、検出スコアが低下し、この段階では本来の検出スコアで検出することはできない。 First, in FIG. 15A, the human head detection frames (rectangles 1502 and 1505) and the whole body detection frames (rectangles 1503 and 1506) are calculated by the method described in the first embodiment. Then, the body axes 1504 and 1507 are calculated by the method described above in this embodiment. Here, since the human body 1501 is partially shielded, the detection score is lowered and cannot be detected with the original detection score at this stage.

図１６は、遮蔽領域１６０３の面積と位置とを算出する方法を説明する図である。遮蔽領域１６０３の面積および位置は、図１４に示した８個のパーツ検出器（図１６の点線１６００〜１６０２）と矩形１５０２〜１５０７が示す各種検出結果とから算出する。 FIG. 16 is a diagram for explaining a method of calculating the area and position of the shielding region 1603. The area and position of the shielding region 1603 are calculated from the eight parts detectors (dotted lines 1600-1602 in FIG. 16) shown in FIG. 14 and various detection results indicated by the rectangles 1502-1507.

図１６（Ａ）において、体軸１５０４、１５０７は、鉛直上向きを基準としたときの角度が１０度未満である。そのため、前景の人体および後景の人体の姿勢は「どちらも立位である」と想定することができる。この場合は、第１の実施形態で説明した方法により遮蔽領域１６０３の面積および位置を計算する。そして、算出した遮蔽領域と重畳するパーツ検出器（図１６（Ａ）の点線１６００〜１６０２）を８個の中から特定し、人体１５０１の検出スコアを補正する。 In FIG. 16A, the body axes 1504 and 1507 have an angle of less than 10 degrees with respect to the vertical upward direction. Therefore, the postures of the foreground human body and the foreground human body can be assumed to be “both standing”. In this case, the area and position of the shielding region 1603 are calculated by the method described in the first embodiment. Then, a part detector (dotted line 1600-1602 in FIG. 16A) that overlaps the calculated shielding area is identified from eight, and the detection score of the human body 1501 is corrected.

具体的な補正方法は、第１の実施形態で説明した方法と同様に、遮蔽が存在しない画像サンプル群と遮蔽が存在する画像サンプル群とをそれぞれ入力し、どのパーツ検出器が遮蔽されているかで場合分けを行う。そして、どのパーツ検出器が遮蔽された場合にどの程度統合スコアが低下するのかの相関を計算しておく。そして、遮蔽状態に応じた補正係数をかけない場合は、遮蔽領域１６０３によって遮蔽されたパーツ検出器を特定し、そのパーツ検出器が遮蔽されていない場合の出力値を代わりに、パーツ検出器の出力値とすることにより、統合スコアの低下を防ぐ。また、遮蔽が無い場合のパーツの検出スコアを予め算出していなくても、統合スコアが下がらないように適当な定数を代入するなどしてもよい。 The specific correction method is the same as the method described in the first embodiment, and inputs an image sample group without shielding and an image sample group with shielding, and which part detector is shielded. In case separation. Then, the correlation of how much the integrated score is lowered when which part detector is shielded is calculated. If the correction coefficient corresponding to the shielding state is not applied, the part detector shielded by the shielding region 1603 is specified, and the output value when the part detector is not shielded is used instead of the part detector. By making it an output value, a decrease in the integrated score is prevented. In addition, even if the detection score of parts when there is no shielding is not calculated in advance, an appropriate constant may be substituted so that the integrated score does not decrease.

一方、図１５（Ｂ）に示す例では、前傾になっている人体１５０８が直立した人体１５０９を部分的に遮蔽している。図１５（Ｂ）に示す例の場合も、図１５（Ａ）と同様に頭部および全身の検出結果が出力され、体軸１５１０、１５１１が算出されている。 On the other hand, in the example illustrated in FIG. 15B, the human body 1508 that is inclined forward partially shields the upright human body 1509. In the case of the example shown in FIG. 15B as well, the head and whole body detection results are output and the body axes 1510 and 1511 are calculated, as in FIG. 15A.

図１６（Ｂ）に示すように、体軸１５１０は、鉛直上向きを基準とした場合に、角度が１０度以上あるため、第１の実施形態で用いた計算方法をそのまま利用することはできない。そこで、角度が１０度以上の場合は、体軸１５１０の直線を延長し、後景の人体の全身検出枠との交点（ｘ_U3，ｙ_U3）を算出する。そして、体軸１５１０を延長した直線と後景の全身検出枠で囲まれる領域とを遮蔽領域１６０４として算出する。 As shown in FIG. 16B, since the body axis 1510 has an angle of 10 degrees or more when the vertical upward direction is used as a reference, the calculation method used in the first embodiment cannot be used as it is. Therefore, when the angle is 10 degrees or more, the straight line of the body axis 1510 is extended and the intersection (x _U3 , y _U3 ) with the whole body detection frame of the human body in the background is calculated. Then, a straight line obtained by extending the body axis 1510 and a region surrounded by the whole body detection frame of the foreground are calculated as a shielding region 1604.

検出スコアの補正方法については、図１６（Ａ）に示す補正方法と同様に遮蔽領域１６０４と重畳する領域を有するパーツ検出器を特定し、予め計算した遮蔽が無い場合のパーツ検出器の検出スコアを代わりに採用することによって補正を行う。 As for the detection score correction method, as in the correction method shown in FIG. 16A, a part detector having a region that overlaps the shielding region 1604 is specified, and the detection score of the part detector when there is no pre-calculation is calculated. Is corrected by adopting instead.

本実施形態におけるパーツ検出器は、人体の姿勢が変動することにより、個々の位置及び検出スコアが大きく変化する。そのため、上記のスコア補正を行う際に、姿勢カテゴリ毎にパーツ検出器の移動量と検出スコアとの相関関係を非特許文献６に記載された方法を用いて予め計算しておく必要がある。 In the parts detector according to the present embodiment, the individual position and the detection score greatly change as the posture of the human body varies. Therefore, when performing the above score correction, it is necessary to previously calculate the correlation between the movement amount of the part detector and the detection score for each posture category using the method described in Non-Patent Document 6.

また、体軸１５１０は、人体１５０８の内部を示す直線であるため、実際の遮蔽領域を示すものではない。したがって、厳密に遮蔽領域を推定する必要がある場合は、体軸１５１０の代わりに人体１５０８のエッジ情報を抽出して遮蔽領域を求めてもよい。 Further, the body axis 1510 is a straight line indicating the inside of the human body 1508 and does not indicate an actual shielding area. Therefore, when it is necessary to precisely estimate the shielding area, the edge information of the human body 1508 may be extracted instead of the body axis 1510 to obtain the shielding area.

図１７は、本実施形態における遮蔽判定部１０２による処理手順の一例を示すフローチャートである。
まず、ステップＳ１２０１の処理は、第１の実施形態で説明した図１２のステップＳ１２０１の処理とほぼ同一の処理である。 FIG. 17 is a flowchart illustrating an example of a processing procedure performed by the shielding determination unit 102 according to the present embodiment.
First, the process of step S1201 is substantially the same as the process of step S1201 of FIG. 12 described in the first embodiment.

次に、ステップＳ１７０１において、前景と推定される人体の全身及び頭部の検出結果と、体軸の検出結果とを参照し、前景と推定される人体を選択する。以下、体軸の検出結果をＡ_Uとする。 In step S1701, the human body estimated as the foreground is selected with reference to the detection results of the whole body and the head of the human body estimated as the foreground and the detection result of the body axis. Hereinafter, the detection result of the body axis is assumed to be _AU .

次に、ステップＳ１７０２において、後景の人体の全身及び頭部の検出結果と、体軸の検出結果とを参照し、ステップＳ１７０１で選択した人体の検出スコアよりも低く、かつ頭部の検出結果に対応する全身の検出結果が算出されている人体を選択する。以下、体軸の検出結果をＢ_Uとする。このように、本実施形態では、頭部の検出結果及び全身の検出結果の他に、体軸の検出結果も参照している。 Next, in step S1702, with reference to the detection result of the whole body and head of the human body in the background and the detection result of the body axis, the detection result of the head is lower than the detection score of the human body selected in step S1701. A human body whose whole body detection result corresponding to is calculated is selected. Hereinafter, the detection result of the body axis is assumed to be _BU . Thus, in the present embodiment, the detection result of the body axis is referred to in addition to the detection result of the head and the detection result of the whole body.

次に、ステップＳ１７０３において、ステップＳ１７０１で参照した前景に存在するであろう人体Ａの体軸Ａ_Uの直線、あるいは体軸Ａ_Uを延長した直線と後景に存在するであろう人体Ｂの全身の検出結果Ｂ_Bとの交差判定を行う。この判定の結果、交点がない場合は、遮蔽している物体がないため、ステップＳ１７０１に戻る。 Next, in step S1703, the human B that may be present in a linear and a rear ground to straight body axis A _U of a human body A that may be present in the foreground referenced, or the body axis A _U extended in step S1701 The intersection with the whole body detection result B _B is determined. If there is no intersection as a result of this determination, there is no object being shielded, and the process returns to step S1701.

一方、ステップＳ１７０３の判定の結果、交点がある場合は、ステップＳ１７０４において、鉛直上向きを基準とした体軸Ａ_Uの傾き（角度）を算出する。そして、ステップＳ１７０５において、体軸Ａ_Uの角度が１０度未満であるか否かを判定する。この判定の結果、体軸Ａ_Uの角度が１０度未満である場合は、前景の人体が"立位"の姿勢であると判定できるので、ステップＳ１７０６の処理に移る。一方、体軸Ａ_Uの角度が１０度以上である場合は、前景の人体が"前傾"あるいは"しゃがみ"の姿勢であると判定し、ステップＳ１７０７の処理に移る。ここでは説明を簡単にするために体軸の角度１０度を基準に処理を変えているが、構築するシステムによっては、立位を示す体軸の角度を任意に設定してもよい。 On the other hand, the result of the determination in step S1703, if there is an intersection, in step S1704, calculates an inclination (angle) of the body axis A _U relative to the vertically upward. Then, in step S1705, the angle of the body axis A _U is equal to or less than 10 degrees. If it is determined that the angle of the body axis A _U is less than 10 degrees, foreground human body it can be determined that the posture of "standing", proceeds to the processing in step S1706. On the other hand, if the angle of the body axis A _U is not less than 10 degrees, it is determined that the foreground of the human body is the posture of the "anteversion" or "squatting" for processing in step S1707. Here, in order to simplify the explanation, the processing is changed based on the body axis angle of 10 degrees. However, depending on the system to be constructed, the body axis angle indicating the standing position may be arbitrarily set.

ステップＳ１７０６においては、前景の人体が立位であると判定し、第１の実施形態で説明した遮蔽領域の計算方法を用いて遮蔽領域の位置と面積を算出する。一方、ステップＳ１７０７において、体軸Ａ_Uあるいは体軸Ａ_Uを延長した直線と全身検出結果Ｂ_Bとで囲まれる領域の位置および面積を算出する。 In step S1706, it is determined that the human body in the foreground is standing, and the position and area of the shielding area are calculated using the shielding area calculation method described in the first embodiment. On the other hand, in step S1707, the position and area of a region surrounded by the body axis A _U or a straight line obtained by extending the body axis A _U and the whole body detection result B _B are calculated.

次に、ステップＳ１７０８において、８個のパーツ検出器の中でどのパーツ検出器が遮蔽されていたかを特定し、遮蔽が無い場合のパーツ検出器の検出スコアに置き換えて統合スコアを補正する。なお、パーツ検出器の検出スコアは予め設定した定数で置き換えるなどしてもよい。ステップＳ１２０７については、図１２のステップＳ１２０７と同様である。 Next, in step S1708, which part detector is shielded among the eight part detectors is specified, and the integrated score is corrected by replacing it with the detection score of the part detector when there is no shielding. The detection score of the parts detector may be replaced with a preset constant. Step S1207 is the same as step S1207 in FIG.

本実施形態では、後景の人物が立位の場合を例に説明したが、立位以外の前傾やしゃがみといった姿勢でも同様の処理で遮蔽を検出することができる。以上、全身検出器から推定される頭部位置の推定枠を統合する処理については第１の実施形態と同様である。本実施形態では、立位以外の姿勢で遮蔽された場合にも遮蔽の姿勢に応じた検出スコアの補正を行うことができるため、精度良く検出できるようになる。 In the present embodiment, the case where the background person is standing is described as an example, but it is possible to detect occlusion by a similar process even in a posture such as forward tilt or squatting other than standing. As described above, the process of integrating the estimation frame of the head position estimated from the whole body detector is the same as that of the first embodiment. In the present embodiment, the detection score corresponding to the shielding posture can be corrected even when the posture is shielded in a posture other than the standing posture, so that the detection can be performed with high accuracy.

（第３の実施形態）
第１及び第２の実施形態では、後景の頭部以外の部位に遮蔽が生じていた場合について説明した。本実施形態では、前景及び後景の人物の頭部、或いは共通部位が重なった場合に、遮蔽を検出する方法について説明する。本実施形態でも、検出対象物は人物とし、推定する共通部位は人物の頭部とする。また、以下の説明では、前景及び後景の人物の頭部が重なっている場合は、遮蔽フラグがＯＮであるものと定義して説明する。また、第１及び第２の実施形態で説明した内容と同じ構成及び処理については説明を省略する。 (Third embodiment)
In 1st and 2nd embodiment, the case where shielding had arisen in parts other than the head of a foreground was demonstrated. In the present embodiment, a method for detecting occlusion when the foreground and background human heads or common parts overlap each other will be described. Also in this embodiment, the detection target is a person, and the common part to be estimated is the head of the person. Further, in the following description, when the heads of the foreground and background people overlap each other, it is defined that the shielding flag is ON. Further, the description of the same configuration and processing as those described in the first and second embodiments is omitted.

図１８は、前景の人体１８００によって後景の人体１８０１が遮蔽されており、かつ両人体の頭部が重なっている場合の検出処理を説明する図である。
図１８に示すような場面において、第１または第２の実施形態で説明した物体検出部１０１を用いて検出処理を行うと、人体が２体あるにも関わらず頭部の検出結果１８０２の１体分しか検出されない。これは、物体検出部１０１の統合処理の結果、検出スコアの低い体軸及び全身の検出結果は、検出スコアの高い頭部と全身及び体軸の検出結果に統合されてしまうためである。 FIG. 18 is a diagram for explaining detection processing when the foreground human body 1800 shields the foreground human body 1801 and the heads of both human bodies overlap.
In the scene shown in FIG. 18, when the detection process is performed using the object detection unit 101 described in the first or second embodiment, one of the head detection results 1802 is obtained even though there are two human bodies. Only body mass is detected. This is because the detection result of the body axis and the whole body with a low detection score is integrated into the detection result of the head, the whole body and the body axis with a high detection score as a result of the integration processing of the object detection unit 101.

そこで本実施形態では、統合処理を行う前に体軸の検出結果を利用して頭部同士の重なりの有無を検出し、頭部同士の重なりが在った場合には、物体検出部１０１の統合処理の処理内容を頭部基準から体軸基準に変更する。これは、頭部基準で統合処理すると得られない正しい出力結果を、体軸基準の統合処理に処理内容を変更することによって得られるようにするためである。 Therefore, in the present embodiment, the presence or absence of overlapping heads is detected using the detection result of the body axis before performing the integration process, and if there is overlapping between the heads, the object detection unit 101 The processing content of the integration process is changed from the head reference to the body axis reference. This is because a correct output result that cannot be obtained by the integration processing based on the head is obtained by changing the processing content to the integration processing based on the body axis.

なお、本実施形態のように頭部が重なり合う場合、それぞれの統合スコアはそれほど低下しない。これは、本実施形態ではパーツベースの検出器で物体検出するため、物体の不連続性の影響を特に受けないためである。したがって、本実施形態では頭部の重なりによる遮蔽が生じていても検出スコアの低下は少ないため、検出スコアの補正処理は行わないようにする。ただし、共通部位を頭部以外に設定した場合は、構築するシステムの特性に応じて遮蔽の有無により検出スコアの補正を行ってもよい。 In addition, when the heads overlap as in the present embodiment, the respective integrated scores do not decrease so much. This is because the object detection is not particularly affected by the discontinuity of the object because the object is detected by the parts-based detector in the present embodiment. Therefore, in the present embodiment, the detection score is hardly lowered even when the head is overlapped by the overlap of the heads, so that the detection score correction process is not performed. However, when the common part is set to other than the head, the detection score may be corrected depending on the presence or absence of shielding according to the characteristics of the system to be constructed.

図１９は、本実施形態における遮蔽判定部１０２による遮蔽判定の処理手順の一例を示すフローチャートである。まず、第２の実施形態で説明した物体検出部１０１の処理により、人体１８００、１８０１の頭部の検出結果１８０２、全身の検出結果１８０３、１８０４、及び体軸１８０５、１８０６がそれぞれ仮に算出されているものとする。また、ステップＳ１２０１、Ｓ１２０７、Ｓ１７０１及びＳ１７０２の処理は、第２の実施形態で説明した図１７の処理と同一の処理であるため、説明は省略する。 FIG. 19 is a flowchart illustrating an example of a procedure for shielding determination by the shielding determination unit 102 according to the present embodiment. First, by the processing of the object detection unit 101 described in the second embodiment, the detection results 1802 of the heads of the human bodies 1800 and 1801, the detection results 1803 and 1804 of the whole body, and the body axes 1805 and 1806 are calculated temporarily. It shall be. Further, the processing in steps S1201, S1207, S1701, and S1702 is the same as the processing in FIG. 17 described in the second embodiment, and thus the description thereof is omitted.

次に、ステップＳ１９０１において、ステップＳ１７０１及びＳ１７０２で参照した人体Ａの体軸Ａ_Uを延長した直線と、人体Ｂの体軸Ｂ_Uを延長した直線との交点を算出する。そして、頭部の検出結果１８０２の矩形内部に交点が存在するかどうかを判定する。この判定の結果、交点が矩形内部にない場合は頭部検出の結果で統合処理ができるため、遮蔽フラグはＯＦＦのまま、ステップＳ１９０２に進む。一方、交点が矩形内部にある場合は、各体軸と組になる頭部が重なり合っていると判定し、遮蔽フラグをＯＮにして、ステップＳ１９０３に進む。 Next, in step S1901, it calculates a straight line obtained by extending the body axis A _U of the body A as referred to in step S1701 and S1702, the intersection of the straight line obtained by extending the body axis B _U of human B. Then, it is determined whether or not an intersection exists within the rectangle of the head detection result 1802. As a result of this determination, if the intersection is not inside the rectangle, the integration process can be performed based on the result of head detection, and thus the process proceeds to step S1902 while the shielding flag remains OFF. On the other hand, if the intersection is inside the rectangle, it is determined that the heads that are paired with each body axis overlap, the shielding flag is turned on, and the process proceeds to step S1903.

ステップＳ１９０２においては、ステップＳ１９０１で人体Ａおよび人体Ｂの頭部が重なり合っていないと判断できるため、第１及び第２の実施形態で説明した頭部検出結果を含めた統合処理により検出スコアを算出する。 In step S1902, since it can be determined in step S1901 that the heads of human body A and human body B do not overlap, the detection score is calculated by the integration process including the head detection results described in the first and second embodiments. To do.

そして、ステップＳ１９０３においては、人体Ａもしくは人体Ｂの頭部によって、遮蔽状態が生じていると判断できるため、統合処理において頭部検出結果は利用せずに、第２の実施形態で説明した体軸の角度を基準とした統合処理を行う。 In step S1903, since it can be determined that the shielding state is caused by the head of the human body A or the human body B, the head detection result is not used in the integration process, and the body described in the second embodiment is used. Performs integration processing based on the axis angle.

図２０は、統合処理前の頭部および体軸の検出結果の一例を示す図である。図２０において、頭部の検出結果の候補２００１は、検出結果の候補群を示しており、それぞれ体軸１８０６、２００２と対応している。 FIG. 20 is a diagram illustrating an example of the detection result of the head and body axes before the integration process. 20, a head detection result candidate 2001 indicates a detection result candidate group, and corresponds to the body axes 1806 and 2002, respectively.

第１の実施形態で説明した統合処理を実施すると、頭部の検出結果の候補２００１は、頭部の検出結果１８０２に統合されてしまう。なお、第１の実施形態の統合処理では、検出スコアと頭部の重複度を基準に処理を行うため、頭部が遮蔽された人体１８０１の頭部の検出結果は統合処理によって、人体１８００の頭部の検出結果１８０２に吸収されてしまう。つまり、遮蔽フラグが立つ場合に、頭部を基準にして統合処理を行うと、遮蔽された人体１８０１の頭部、体軸、全身のそれぞれの検出結果が遮蔽している人体１８００の各検出結果に吸収されてしまう。 When the integration process described in the first embodiment is performed, the head detection result candidate 2001 is integrated into the head detection result 1802. In the integration process of the first embodiment, the process is performed based on the detection score and the degree of overlap of the heads. Therefore, the detection result of the head of the human body 1801 whose head is shielded is obtained by the integration process. The head detection result 1802 is absorbed. In other words, when the integration process is performed with the head as a reference when the shielding flag is set, each detection result of the human body 1800 in which the detection results of the head, body axis, and whole body of the shielded human body 1801 are shielded. Will be absorbed.

そこで本実施形態では、遮蔽フラグがＯＮである場合は、頭部基準から体軸基準に統合処理の基準を変更する。図２０に示すように、体軸２００２も頭部の検出結果１９０１と同様に複数検出されるため、最も高い検出スコアの体軸と各体軸の検出候補の直線とのなす角度を基準に検出結果を統合する。例えば、体軸１８０５と他の体軸の検出結果とを角度基準（例えば閾値を３０度）で比べると、体軸２００２はいずれもなす角が３０度未満なので体軸１８０５に統合される。なお、体軸１８０６は３０度以上なす角に違いがあるため統合されない。また、角度の代わりに体軸の検出結果の端点（喉元および腰の中心座標）を利用し、２直線の距離を基準に統合処理を行ってもよい。 Therefore, in this embodiment, when the shielding flag is ON, the standard of the integration process is changed from the head reference to the body axis reference. As shown in FIG. 20, since a plurality of body axes 2002 are also detected in the same manner as the head detection result 1901, detection is based on the angle formed between the body axis having the highest detection score and a straight line of detection candidates for each body axis. Integrate the results. For example, comparing the body axis 1805 and the detection results of other body axes with an angle reference (for example, the threshold is 30 degrees), the body axis 2002 is integrated with the body axis 1805 because the angle formed by each body is less than 30 degrees. The body axis 1806 is not integrated because there is a difference in the angle formed by 30 degrees or more. Further, instead of the angle, the end point (the center coordinates of the throat and the waist) of the detection result of the body axis may be used to perform the integration process based on the distance between two straight lines.

なお、本実施形態では、遮蔽が生じた場合に検出結果の統合処理によって本来検出される検出結果の候補が、他の結果に吸収されることを防止することを目的とした例外処理である。検出結果によっては遮蔽フラグがＯＮとなる検出候補が多くなる場合もあり、遮蔽フラグがＯＮとなる検出候補の検出スコアの値でソート処理を施し、検出スコアの高い順に予め設定した数（例えば１、２など）の検出結果を検出候補として選択する。検出候補に選択されなかった検出結果は統合処理により削除される。検出候補として残す数は、構築するシステムの環境により設定することが望ましい。 In the present embodiment, the detection result candidate that is originally detected by the detection result integration process when shielding occurs is an exception process for the purpose of preventing the detection result from being absorbed by another result. Depending on the detection result, there may be many detection candidates whose shielding flag is ON, and the sorting process is performed on the detection score values of the detection candidates whose shielding flag is ON, and a preset number (for example, 1) 2) is selected as a detection candidate. Detection results that are not selected as detection candidates are deleted by integration processing. The number to be left as detection candidates is preferably set according to the environment of the system to be constructed.

以上のように本実施形態の統合方法では、前後で違う姿勢を取っている複数の人体に対して体軸の検出結果を活用して頭部の検出結果が重なっていても、その後の統合処理で削除されることなく頑健に人体検出を行うことができる。 As described above, in the integration method of the present embodiment, even if the detection results of the head overlap by utilizing the detection results of the body axis for a plurality of human bodies taking different postures before and after, the subsequent integration processing It is possible to robustly detect a human body without being deleted.

（その他の実施形態）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other embodiments)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

１０１物体検出部
１０２遮蔽判定部 101 Object detection unit 102 Occlusion determination unit

Claims

An object detection device that detects a first object located in the foreground and a second object located in the background from an input image,
First detection means for detecting partial areas of the first and second objects;
Second detection means for detecting information indicating postures of the first and second objects;
Determination means for determining a shielding state of the second object by the first object based on the partial area detected by the first detection means and the information detected by the second detection means;
Correction means for correcting the detection result of the second object according to the shielding state determined by the determination means;
An object detection apparatus comprising:

The object detection apparatus according to claim 1, wherein the first and second detection units detect the position and the size by dividing the object into a plurality of parts.

The object detection apparatus according to claim 1, wherein the first detection unit detects a position and a size of a human head.

The object detection apparatus according to claim 1, wherein the second detection unit detects the position and size of the whole body of the human body.

The determination unit shields whether the partial region of the first object detected by the first detection unit intersects the region of the second object detected by the second detection unit. The object detection device according to claim 4, wherein the state is determined.

The determination unit determines a shielding state based on an inclusion relationship between the partial region of the first object detected by the first detection unit and the region of the second object detected by the second detection unit. The object detection apparatus according to claim 4, wherein the shielding area is calculated.

The object detection apparatus according to claim 6, wherein the correction unit corrects a detection result of the second object according to an area of the shielding region.

The object detection apparatus according to claim 4, wherein the second detection unit further detects a body axis of the human body.

The object detection apparatus according to claim 8, wherein the determination unit switches the determination method of the shielding state according to the angle of the body axis detected by the second detection unit.

The determination unit calculates a region generated when the body axis of the first object detected by the second detection unit intersects a rectangle representing the whole body of the second object as a shielding region. The object detection device according to claim 8 or 9, wherein

The determination means is a partial region of the first object in which a straight line extending from the body axis of the first and second objects detected by the second detection means is detected by the first detection means. The object detection apparatus according to claim 8, wherein when the two intersect each other, the partial areas of the first and second objects are determined to overlap.

When the determination unit determines that the partial regions of the first and second objects overlap, the correction unit does not correct the detection result of the second object. The object detection apparatus according to claim 11.

When the determination unit determines that the partial areas of the first and second objects overlap, the determination unit changes the detection result by the first detection unit to a detection result based on the body axis. The object detection apparatus according to claim 11, wherein the object detection apparatus is an object detection apparatus.

The determination unit, when determining that the partial regions of the first and second objects overlap each other, determines an angle formed by each body axis and a detection result candidate. Item 14. The object detection device according to Item 13.

An object detection method for detecting a first object located in the foreground and a second object located in the background from an input image,
A first detection step of detecting partial regions of the first and second objects;
A second detection step of detecting information indicating postures of the first and second objects;
A determination step of determining a shielding state of the second object by the first object based on the partial area detected in the first detection step and the information detected in the second detection step;
A correction step of correcting the detection result of the second object according to the shielding state determined in the determination step;
An object detection method comprising:

A program for controlling an object detection device that detects a first object located in the foreground and a second object located in the background from an input image,
A first detection step of detecting partial regions of the first and second objects;
A second detection step of detecting information indicating postures of the first and second objects;
A determination step of determining a shielding state of the second object by the first object based on the partial area detected in the first detection step and the information detected in the second detection step;
A correction step of correcting the detection result of the second object according to the shielding state determined in the determination step;
A program that causes a computer to execute.