JP7536464B2

JP7536464B2 - IMAGE PROCESSING APPARATUS, IMAGING APPARATUS, IMAGE PROCESSING METHOD, AND PROGRAM

Info

Publication number: JP7536464B2
Application number: JP2020026506A
Authority: JP
Inventors: 良介辻
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-02-19
Filing date: 2020-02-19
Publication date: 2024-08-20
Anticipated expiration: 2040-02-19
Also published as: JP2021132298A

Description

本発明は、画像処理装置、撮像装置、画像処理方法およびプログラムに関する。 The present invention relates to an image processing device, an imaging device, an image processing method, and a program.

例えば、デジタルカメラやデジタルビデオカメラ等の撮像装置は、画像から特定の被写体パターンを自動的に検出する画像処理を行う。この画像処理により、人間の顔領域等を特定することができる。関連する技術として特許文献１の技術が提案されている。特許文献１の技術は、顔領域検出の対象となるフレームの画像に対しての変化量が所定内の撮影画像に基づき、ＡＦ／ＡＥ／ＷＢ評価値検出を行う。また、近年では、画像からの被写体の検出に、コンボリューショナル・ニューラル・ネットワーク（以下、ＣＮＮとする）が用いられている。ＣＮＮに関連する技術が、非特許文献１に開示されている。 For example, imaging devices such as digital cameras and digital video cameras perform image processing to automatically detect specific subject patterns from an image. This image processing makes it possible to identify human face regions and the like. The technology of Patent Document 1 has been proposed as a related technology. The technology of Patent Document 1 detects AF/AE/WB evaluation values based on a captured image in which the amount of change relative to the image of the frame that is the target of face region detection is within a predetermined range. In recent years, convolutional neural networks (hereinafter referred to as CNN) have also been used to detect subjects from images. Technology related to CNN is disclosed in Non-Patent Document 1.

特開２００５－３１８５５４号公報JP 2005-318554 A ＡｌｅｘＫｒｉｚｈｅｖｓｋｙ，ＩｌｙａＳｕｔｓｋｅｖｅｒ，ＧｅｏｆｆｒｅｙＥ．Ｈｉｎｔｏｎ，ＩｍａｇｅＮｅｔＣｌａｓｓｉｆｉｃａｔｉｏｎｗｉｔｈＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋｓ，ＡｄｖａｎｃｅｓｉｎＮｅｕｒａｌＩｎｆｏｒｍａｔｉｏｎＰｒｏｃｅｓｓｉｎｇＳｙｓｔｅｍｓ２５（ＮＩＰＳ’１２），２０１２Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems 25 (NIPS’12), 2012

被写体の検出に、被辞書データを用いることができる。辞書データは、被写体ごとに生成される。このため、複数種類の被写体を検出する場合、複数の辞書データを順次切り替えながら、被写体検出が行われる。複数の辞書データを順次切り替えて、被写体検出が行われる場合、目的の被写体に適した辞書データを用いて被写体検出を行うまでに要する時間が長くなることがある。また、複数の辞書データの中から、目的の被写体と特性が異なる辞書データを用いて被写体検出が行われた場合、被写体が誤検出されることがある。特に、辞書データの数が多くなると、上記の問題が顕著になる。 Dictionary data can be used to detect subjects. Dictionary data is generated for each subject. Therefore, when detecting multiple types of subjects, subject detection is performed while sequentially switching between multiple dictionary data. When subject detection is performed by sequentially switching between multiple dictionary data, it may take a long time to perform subject detection using dictionary data suitable for the target subject. Furthermore, when subject detection is performed using dictionary data from multiple dictionary data that has different characteristics from the target subject, the subject may be erroneously detected. The above problem becomes particularly noticeable when the number of dictionary data is large.

本発明は、被写体検出を行う際の辞書データの選択を効率的に行うことを目的とする。 The present invention aims to efficiently select dictionary data when performing subject detection.

上記目的を達成するために、本発明の画像処理装置は、撮像された画像データに基いて、被写体の検出に用いられる複数の辞書データのそれぞれについての被写体有無の尤度を推定する推定手段と、推定された前記尤度に応じて設定された辞書データを用いて、前記画像データから被写体を検出する被写体検出手段と、前記被写体検出手段に設定する複数の辞書データを切り替える制御手段と、を備え、前記制御手段は、前記尤度が所定の閾値未満の辞書データ以外の複数の辞書データを切り替えることを特徴とする。 In order to achieve the above-mentioned object, the image processing device of the present invention comprises an estimation means for estimating the likelihood of the presence or absence of a subject for each of a plurality of dictionary data used to detect a subject based on captured image data, a subject detection means for detecting a subject from the image data using dictionary data set according to the estimated likelihood, and a control means for switching between the plurality of dictionary data set in the subject detection means, wherein the control means switches between the plurality of dictionary data other than dictionary data whose likelihood is less than a predetermined threshold value .

本発明によれば、被写体検出を行う際の辞書データの選択を効率的に行うことができる。 According to the present invention, it is possible to efficiently select dictionary data when performing subject detection.

撮像装置の断面図である。FIG. システム制御部と各部との関係を示す図である。FIG. 2 is a diagram illustrating the relationship between a system control unit and each unit. 本実施形態の処理の流れを示すフローチャートである。3 is a flowchart showing a process flow of the present embodiment. 辞書データの切り替えの一例を示す図である。FIG. 11 is a diagram illustrating an example of switching of dictionary data.

以下、本発明の各実施の形態について図面を参照しながら詳細に説明する。しかしながら、以下の各実施の形態に記載されている構成はあくまで例示に過ぎず、本発明の範囲は各実施の形態に記載されている構成によって限定されることはない。 Each embodiment of the present invention will be described in detail below with reference to the drawings. However, the configurations described in each of the following embodiments are merely examples, and the scope of the present invention is not limited to the configurations described in each of the embodiments.

以下、図面を参照して、本実施形態について説明する。本実施形態の撮像装置１００は、複数の撮像光学系により結像された被写体像から画像信号を生成し、被写体検出を行う。図１は、撮像装置１００の断面図である。撮像装置１００の構成は、図１の例には限定されない。撮像装置１００は、カメラ本体１０１および撮影レンズ１０２を有する。カメラ本体１０１は、例えば、デジタル一眼レフカメラ本体である。カメラ本体１０１の前面に、撮影レンズ１０２が着脱可能に装着される。撮像装置１００は、システム制御部２０１を有する。システム制御部２０１は、ＣＰＵ、ＲＡＭおよびＲＯＭを含む。例えば、ＲＯＭに記憶された制御プログラムがＲＡＭに展開され、ＣＰＵが制御プログラムを実行することで、本実施形態の処理が実現されてもよい。ＣＰＵは、例えば、複数のタスクを並列処理できるマルチコアＣＰＵであってもよい。 The present embodiment will be described below with reference to the drawings. The imaging device 100 of this embodiment generates an image signal from a subject image formed by a plurality of imaging optical systems, and performs subject detection. FIG. 1 is a cross-sectional view of the imaging device 100. The configuration of the imaging device 100 is not limited to the example of FIG. 1. The imaging device 100 has a camera body 101 and a photographing lens 102. The camera body 101 is, for example, a digital single-lens reflex camera body. The photographing lens 102 is detachably attached to the front of the camera body 101. The imaging device 100 has a system control unit 201. The system control unit 201 includes a CPU, a RAM, and a ROM. For example, a control program stored in the ROM may be expanded in the RAM, and the CPU may execute the control program, thereby realizing the processing of this embodiment. The CPU may be, for example, a multi-core CPU capable of parallel processing of multiple tasks.

撮影レンズ１０２は交換可能なレンズである。カメラ本体１０１と撮影レンズ１０２とは、マウント接点群１１５を介して電気的にも接続される。撮影レンズ１０２の中には、フォーカシングレンズ１１３および絞りシャッター１１４が設けられている。マウント接点群１１５を介した制御により、撮影レンズ１０２内に取り込む光量およびピントを調整可能なように構成されている。 The photographing lens 102 is an interchangeable lens. The camera body 101 and the photographing lens 102 are also electrically connected via a mount contact group 115. A focusing lens 113 and an aperture shutter 114 are provided inside the photographing lens 102. The amount of light taken into the photographing lens 102 and the focus can be adjusted by controlling via the mount contact group 115.

次に、カメラ本体１０１について説明する。カメラ本体１０１は、メインミラー１０３およびサブミラー１０４により構成されるクイックリターンミラーを有する。メインミラー１０３は、ハーフミラーであり、ファインダー観測状態では撮影光路上に斜設され、撮影レンズ１０２から入射される光束をファインダー光学系へと反射する。一方、透過光はサブミラー１０４を介して測距センサ１０５へと入射する。測距センサ１０５は、撮影レンズ１０２の二次結像面を焦点検出ラインセンサ上に形成することで、位相差検出方式によって撮影レンズ１０２の焦点調節状態を検出することのできるＡＦ像信号を生成する。生成されたＡＦ像信号はシステム制御部２０１へ送信される。システム制御部２０１は、ＡＦ像信号に基づいてフォーカシングレンズ１１３の焦点状態を検出する。また、システム制御部２０１は、焦点検出の結果に基づいてフォーカシングレンズ１１３の駆動を制御することで焦点調節を行う。 Next, the camera body 101 will be described. The camera body 101 has a quick return mirror composed of a main mirror 103 and a sub-mirror 104. The main mirror 103 is a half mirror that is obliquely installed on the photographing optical path in the viewfinder observation state and reflects the light beam incident from the photographing lens 102 to the viewfinder optical system. On the other hand, the transmitted light is incident on the distance measurement sensor 105 via the sub-mirror 104. The distance measurement sensor 105 forms a secondary image plane of the photographing lens 102 on a focus detection line sensor, thereby generating an AF image signal that can detect the focus adjustment state of the photographing lens 102 by a phase difference detection method. The generated AF image signal is transmitted to the system control unit 201. The system control unit 201 detects the focus state of the focusing lens 113 based on the AF image signal. The system control unit 201 also performs focus adjustment by controlling the drive of the focusing lens 113 based on the result of focus detection.

ピント板１０６は、ファインダー光学系を構成する撮影レンズ１０２の予定結像面に配置されている。ペンタリズム１０７は、ファインダー光路を変更するために用いられる。測光センサ１０８は、照射される光信号から輝度信号および色差信号を持つ画像データを生成する。測光センサ１０８は、被写体から照射される光信号からＡＥ像信号を生成し、システム制御部２０１へ送信する。システム制御部２０１は、受信したＡＥ像信号を用いて露出制御等を行う。システム制御部２０１は、後述する被写体検出部２０４にて検出された被写体に基づき、焦点露出を最適化させる。カメラ本体１０１には、アイピース１０９が設けられている。撮影者は、アイピース１０９からピント板１０６を観察することで、撮影画面および撮影情報を確認することができる。 The focusing screen 106 is disposed on the intended imaging plane of the photographing lens 102 that constitutes the viewfinder optical system. The pentagon 107 is used to change the viewfinder optical path. The photometry sensor 108 generates image data having a luminance signal and a color difference signal from the irradiated light signal. The photometry sensor 108 generates an AE image signal from the light signal irradiated from the subject and transmits it to the system control unit 201. The system control unit 201 performs exposure control and the like using the received AE image signal. The system control unit 201 optimizes the focus exposure based on the subject detected by the subject detection unit 204, which will be described later. The camera body 101 is provided with an eyepiece 109. The photographer can check the shooting screen and shooting information by observing the focusing screen 106 through the eyepiece 109.

また、カメラ本体１０１は、フォーカルプレーンシャッター１１０および撮像センサ１１１を有する。図２では、フォーカルプレーンシャッター１１０は、「シャッター」と表記される。露光が行われる際、メインミラー１０３およびサブミラー１０４は、撮影光路上から退避し、フォーカルプレーンシャッター１１０が開く。これにより、撮像センサ１１１が露光される。フォーカルプレーンシャッター１１０は、撮影が行われているときには撮像センサ１１１を遮光する。また、撮影が行われているときには、フォーカルプレーンシャッター１１０が開かれ、撮像センサ１１１に被写体光束が導かれる。 The camera body 101 also has a focal plane shutter 110 and an image sensor 111. In FIG. 2, the focal plane shutter 110 is written as "shutter." When exposure is performed, the main mirror 103 and the sub-mirror 104 move away from the photographing optical path, and the focal plane shutter 110 opens. This exposes the image sensor 111. The focal plane shutter 110 blocks light from the image sensor 111 when photographing is being performed. Also, when photographing is being performed, the focal plane shutter 110 is opened, and the subject light beam is directed to the image sensor 111.

撮像部としての撮像センサ１１１は、ＣＣＤセンサやＣＭＯＳセンサ等で構成される。撮像センサ１１１は、赤外カットフィルターやローパスフィルター等を含んでもよい。撮像センサ１１１は、撮影レンズ１０２の撮影光学系を通過して結像した被写体像を光電変換し、撮影画像を生成するための画像信号をシステム制御部２０１に送信する。システム制御部２０１は、受信した画像信号から撮影画像を生成して画像記憶部２０２に記憶するとともに、ＬＣＤ等の表示部１１２に表示する。操作部２０３は、カメラ本体１０１に設けられるレリーズボタンやスイッチ、接続機器等を介して行なわれるユーザー操作を検知し、操作内容に応じた信号をシステム制御部２０１へ送信する。レリーズボタンが半押し操作等されると、レリーズスイッチＳＷ１がオンになり、ＡＦやＡＥ等の撮影準備動作が行われる。レリーズボタンが全押し操作等されると、レリーズスイッチＳＷ２がオンになり、静止画の撮影動作が行われる。表示部１１２は、撮影した結果をユーザーが確認できるように、直前に撮影した静止画を一定時間、表示する。 The imaging sensor 111 as an imaging unit is composed of a CCD sensor, a CMOS sensor, etc. The imaging sensor 111 may include an infrared cut filter, a low-pass filter, etc. The imaging sensor 111 photoelectrically converts the subject image formed through the imaging optical system of the imaging lens 102, and transmits an image signal for generating a captured image to the system control unit 201. The system control unit 201 generates a captured image from the received image signal, stores it in the image storage unit 202, and displays it on the display unit 112 such as an LCD. The operation unit 203 detects user operations performed via the release button, switch, connected device, etc. provided on the camera body 101, and transmits a signal according to the operation content to the system control unit 201. When the release button is pressed halfway, the release switch SW1 is turned on, and a shooting preparation operation such as AF or AE is performed. When the release button is pressed all the way, the release switch SW2 is turned on, and a still image shooting operation is performed. The display unit 112 displays the most recently captured still image for a certain period of time so that the user can check the captured image.

図２は、システム制御部２０１と各部との関係を示す図である。制御手段としてのシステム制御部２０１には、上述した各部が接続されるとともに、被写体検出部２０４、辞書推定部２０５および辞書データ記憶部２０６が接続される。被写体検出部２０４および辞書推定部２０５は、システム制御部２０１の一部であってもよいし、システム制御部２０１とは別途に設けられてもよい。システム制御部２０１、被写体検出部２０４および辞書推定部２０５により画像処理装置が構成されてもよい。該画像処理装置は、例えば、スマートフォンやタブレット端末等に搭載されてもよい。 FIG. 2 is a diagram showing the relationship between the system control unit 201 and each unit. The above-mentioned units are connected to the system control unit 201 as a control means, and the subject detection unit 204, dictionary estimation unit 205, and dictionary data storage unit 206 are also connected to the system control unit 201. The subject detection unit 204 and dictionary estimation unit 205 may be part of the system control unit 201, or may be provided separately from the system control unit 201. The system control unit 201, subject detection unit 204, and dictionary estimation unit 205 may form an image processing device. The image processing device may be installed in, for example, a smartphone, a tablet terminal, etc.

被写体検出手段としての被写体検出部２０４は、機械学習により生成される辞書データに基づいて、被写体検出を行う。本実施形態では、被写体検出部２０４は、複数種類の被写体を検出するために、被写体ごとの辞書データを用いる。各辞書データは、例えば、対応する被写体の特徴が登録されたデータである。被写体検出部２０４は、被写体ごとの辞書データを順次切り替えながら被写体検出を行う。本実施形態では、被写体ごとの辞書データは辞書データ記憶部２０６に記憶される。従って、辞書データ記憶部２０６には、複数の辞書データが記憶される。システム制御部２０１は、複数の辞書データの中から何れの辞書データを用いて被写体検出を行うかを、辞書推定部２０５の推定結果に基づいて、決定する。 The subject detection unit 204, which serves as a subject detection means, performs subject detection based on dictionary data generated by machine learning. In this embodiment, the subject detection unit 204 uses dictionary data for each subject in order to detect multiple types of subjects. Each dictionary data is, for example, data in which the characteristics of the corresponding subject are registered. The subject detection unit 204 performs subject detection while sequentially switching between dictionary data for each subject. In this embodiment, the dictionary data for each subject is stored in the dictionary data storage unit 206. Therefore, multiple dictionary data are stored in the dictionary data storage unit 206. The system control unit 201 determines which dictionary data from the multiple dictionary data to use for subject detection based on the estimation result of the dictionary estimation unit 205.

辞書推定部２０５は、画像データに基いて、辞書データごとの被写体有無の尤度を推定する。辞書データ記憶部２０６には、被写体ごとの辞書データが記憶されている。辞書推定部２０５は、画像中に、何れの被写体が含まれているかを推定するために、辞書データごとの被写体有無の尤度を推定結果として出力する。被写体検出部２０４は、撮像された画像データと推定された辞書データとに基づいて、画像中の被写体の位置を推定する。被写体検出部２０４は、被写体の位置やサイズ、信頼度等を推定して、推定した情報を出力してもよい。被写体検出部２０４は、他の情報を出力してもよい。 The dictionary estimation unit 205 estimates the likelihood of the presence or absence of a subject for each dictionary data based on image data. The dictionary data storage unit 206 stores dictionary data for each subject. The dictionary estimation unit 205 outputs the likelihood of the presence or absence of a subject for each dictionary data as an estimation result in order to estimate which subject is included in the image. The subject detection unit 204 estimates the position of the subject in the image based on the captured image data and the estimated dictionary data. The subject detection unit 204 may estimate the position, size, reliability, etc. of the subject and output the estimated information. The subject detection unit 204 may output other information.

辞書データとしては、例えば、被写体として「人物」を検出するための辞書データや「動物」を検出するための辞書データ、「乗物」を検出するための辞書データ等がある。また、「人物の全体」を検出するための辞書データと「人物の顔」を検出するための辞書データとが別個に辞書データ記憶部２０６に記憶されていてもよい。辞書推定部２０５は、画像データに基いて、例えば、「人物」と「動物」と「乗物」とのそれぞれの辞書データについての被写体有無の尤度を確率として出力する。 The dictionary data includes, for example, dictionary data for detecting "people" as a subject, dictionary data for detecting "animals", and dictionary data for detecting "vehicles". Furthermore, dictionary data for detecting "the whole person" and dictionary data for detecting "a person's face" may be stored separately in the dictionary data storage unit 206. The dictionary estimation unit 205 outputs, based on the image data, the likelihood of the presence or absence of a subject for each of the dictionary data for "people", "animals", and "vehicles" as a probability.

被写体検出部２０４は、尤度が高い辞書データを用いて、画像データから被写体検出を行う。本実施形態では、辞書推定部２０５は、機械学習されたＣＮＮにより構成され、辞書データごとの尤度を出力する。被写体検出部２０４は、機械学習されたＣＮＮにより構成され、画像データに含まれる被写体の位置等を推定する。本実施形態では、被写体検出部２０４および辞書推定部２０５は、それぞれ異なるＣＮＮ（コンボリューショナル・ニューラル・ネットワーク）により構成される。被写体検出部２０４および辞書推定部２０５は、ＧＰＵ（グラフィックス・プロセッシング・ユニット）やＣＮＮによる推定処理に特化した回路で実現されてもよい。 The subject detection unit 204 detects subjects from image data using dictionary data with high likelihood. In this embodiment, the dictionary estimation unit 205 is configured with a machine-learned CNN and outputs the likelihood for each dictionary data. The subject detection unit 204 is configured with a machine-learned CNN and estimates the position of the subject included in the image data. In this embodiment, the subject detection unit 204 and the dictionary estimation unit 205 are each configured with a different CNN (convolutional neural network). The subject detection unit 204 and the dictionary estimation unit 205 may be realized by a GPU (graphics processing unit) or a circuit specialized for estimation processing by CNN.

ＣＮＮの機械学習は、任意の手法で行われ得る。例えば、サーバ等の所定のコンピュータが、ＣＮＮの機械学習を行い、撮像装置１００は、学習されたＣＮＮを、所定のコンピュータから取得してもよい。例えば、所定のコンピュータが、学習用の画像データを入力とし、学習用の画像データに対応する被写体の位置等を教師データとした教師あり学習を行うことで、被写体検出部２０４のＣＮＮの学習が行われてもよい。また、所定のコンピュータが、学習用の画像データを入力とし、学習用の画像データの被写体に対応する辞書データを教師データとした教師あり学習を行うことで、辞書推定部２０５のＣＮＮの学習が行われてもよい。以上により、学習済みのＣＮＮが生成される。ＣＮＮの学習は、撮像装置１００または上述した画像処理装置で行われてもよい。 The machine learning of the CNN may be performed by any method. For example, a specific computer such as a server may perform machine learning of the CNN, and the imaging device 100 may obtain the trained CNN from the specific computer. For example, the CNN of the subject detection unit 204 may be trained by a specific computer performing supervised learning using the image data for training as input and the position of the subject corresponding to the image data for training as teacher data. The CNN of the dictionary estimation unit 205 may be trained by a specific computer performing supervised learning using the image data for training as input and dictionary data corresponding to the subject of the image data for training as teacher data. In this way, a trained CNN is generated. The CNN may be trained by the imaging device 100 or the image processing device described above.

次に、本実施形態の処理の流れについて説明する。図３は、本実施形態の処理の流れを示すフローチャートである。図３のステップＳ３０１からステップＳ３１０までの一連の処理は、撮像装置１００による１フレーム分（１枚の画像データ）に相当する処理である。ステップＳ３０１では、システム制御部２０１は、レリーズスイッチＳＷ１またはレリーズスイッチＳＷ２がオンになっているかを判定する。システム制御部２０１は、ステップＳ３０１でＹｅｓと判定した場合、フレームを１つ進めるとともに、処理をステップＳ３０２に進める。システム制御部２０１は、ステップＳ３０１でＮｏと判定した場合、処理を終了させる。 Next, the process flow of this embodiment will be described. FIG. 3 is a flowchart showing the process flow of this embodiment. A series of processes from step S301 to step S310 in FIG. 3 corresponds to one frame (one piece of image data) by the imaging device 100. In step S301, the system control unit 201 determines whether the release switch SW1 or the release switch SW2 is on. If the system control unit 201 determines Yes in step S301, it advances one frame and advances the process to step S302. If the system control unit 201 determines No in step S301, it ends the process.

ステップＳ３０２では、システム制御部２０１は、測光センサ１０８に電荷蓄積を行わせ、生成された像信号をＡＥ像信号として読み出すとともに、測距センサ１０５に電荷蓄積を行わせ、生成された像信号をＡＦ像信号として読み出す。図３では、ＡＥ像信号およびＡＦ像信号は、像信号と表記される。ステップＳ３０３では、辞書推定部２０５は、ステップＳ３０２で読み出したＡＥ像信号を入力画像として、辞書データごとに被写体有無の尤度を出力する。ステップＳ３０４では、システム制御部２０１は、辞書推定部２０５が出力した各辞書データのそれぞれの尤度に基づき辞書データを選定し、被写体検出のための辞書データとして設定する。システム制御部２０１は、例えば、尤度が最も高い１つの辞書データのみを選定してもよいし、尤度が所定の閾値より大きい複数の辞書データを選定してもよい。 In step S302, the system control unit 201 causes the photometry sensor 108 to accumulate charge and reads out the generated image signal as an AE image signal, and causes the distance measurement sensor 105 to accumulate charge and reads out the generated image signal as an AF image signal. In FIG. 3, the AE image signal and the AF image signal are referred to as image signals. In step S303, the dictionary estimation unit 205 uses the AE image signal read out in step S302 as an input image and outputs the likelihood of the presence or absence of a subject for each dictionary data. In step S304, the system control unit 201 selects dictionary data based on the respective likelihoods of each dictionary data output by the dictionary estimation unit 205, and sets the dictionary data for subject detection. The system control unit 201 may, for example, select only one dictionary data with the highest likelihood, or may select multiple dictionary data with likelihoods greater than a predetermined threshold.

ステップＳ３０５では、被写体検出部２０４は、ステップＳ３０２で読み出したＡＥ像信号を入力画像として、ステップＳ３０４で設定された辞書データを用いて、被写体検出を行う。このとき、被写体検出部２０４は、検出した被写体の位置やサイズ、信頼度等の情報を出力する。このとき、システム制御部２０１は、被写体検出部２０４が出力した上記の情報を表示部１１２に表示させてもよい。また、システム制御部２０１は、辞書推定部２０５が出力した各辞書データのそれぞれの尤度の情報を表示部１１２に表示させてもよい。 In step S305, the subject detection unit 204 performs subject detection using the dictionary data set in step S304, with the AE image signal read out in step S302 as an input image. At this time, the subject detection unit 204 outputs information such as the position, size, and reliability of the detected subject. At this time, the system control unit 201 may cause the display unit 112 to display the above information output by the subject detection unit 204. In addition, the system control unit 201 may cause the display unit 112 to display the likelihood information of each dictionary data output by the dictionary estimation unit 205.

ステップＳ３０６では、システム制御部２０１は、ステップＳ３０５で検出した被写体の位置に最も近い焦点検出点を選択し、ステップＳ３０２で取得したＡＦ像信号を用いて、選択した焦点検出点の焦点状態を検出する。なお、ステップＳ３０５で被写体が検出されなかった場合、システム制御部２０１は、全ての焦点検出点の焦点検出を行った上で、最も撮像装置１００に近い位置に焦点がある焦点検出点を選択する。ステップＳ３０７では、システム制御部２０１は、ステップＳ３０６で選択された焦点検出点の焦点状態に基づいて、フォーカシングレンズ１１３の焦点位置を調節する。 In step S306, the system control unit 201 selects the focus detection point closest to the position of the subject detected in step S305, and detects the focus state of the selected focus detection point using the AF image signal acquired in step S302. If a subject is not detected in step S305, the system control unit 201 performs focus detection for all focus detection points, and then selects the focus detection point whose focus is closest to the imaging device 100. In step S307, the system control unit 201 adjusts the focus position of the focusing lens 113 based on the focus state of the focus detection point selected in step S306.

ステップＳ３０８では、システム制御部２０１は、ステップＳ３０２で読み出したＡＥ像信号を用いて所定の手法で自動露出演算を行い、絞り値（ＡＶ値）やシャッタスピード（ＴＶ値）、ＩＳＯ感度（ＩＳＯ値）等を決定する。ここでのＡＶ値やＴＶ値、ＩＳＯ値は、予め記憶されたプログラム線図を用いて決定される。ステップＳ３０９では、システム制御部２０１は、レリーズスイッチＳＷ２がＯＮであるかを判定する。システム制御部２０１は、ステップＳ３０９でＹｅｓと判定した場合、処理を、ステップＳ３１０に進める。一方、システム制御部２０１は、ステップＳ３０９でＮｏと判定した場合、処理をステップＳ３０２に戻す。 In step S308, the system control unit 201 performs automatic exposure calculations in a predetermined manner using the AE image signal read out in step S302, and determines the aperture value (AV value), shutter speed (TV value), ISO sensitivity (ISO value), etc. The AV value, TV value, and ISO value here are determined using a program diagram stored in advance. In step S309, the system control unit 201 determines whether the release switch SW2 is ON. If the system control unit 201 determines Yes in step S309, it advances the process to step S310. On the other hand, if the system control unit 201 determines No in step S309, it returns the process to step S302.

ステップＳ３１０では、システム制御部２０１は、メインミラー１０３およびサブミラー１０４をアップすることで光路上から退避させ、撮像センサ１１１を露光させて、撮像を行う。露光された撮像センサ１１１は、画像信号を生成し、生成された画像信号をシステム制御部２１０に送信する。そして、システム制御部２１０は、撮像センサ１１１から受信した画像信号に基づいて画像データを生成し、画像記憶部２０２に記憶するとともに、表示部１１２に画像を表示する。 In step S310, the system control unit 201 moves the main mirror 103 and the sub-mirror 104 up to remove them from the optical path, exposes the imaging sensor 111, and captures an image. The exposed imaging sensor 111 generates an image signal and transmits the generated image signal to the system control unit 210. The system control unit 210 then generates image data based on the image signal received from the imaging sensor 111, stores the image in the image storage unit 202, and displays the image on the display unit 112.

次に、辞書データの切り替えについて説明する。システム制御部２０１は、辞書推定部２０５が推定した辞書データごとの尤度に基づいて、被写体検出部２０４に辞書データを設定する。本実施形態では、システム制御部２０１は、辞書データごとの尤度に基づいて、被写体検出部２０４に設定する辞書データの順番および周期を設定する。図４は、辞書データの切り替えの一例を示す図である。図４の例では、辞書データ記憶部２０６に３つの辞書データ（辞書データ１～辞書データ３）が記憶されているものとする。ただし、辞書データ記憶部２０６には、多くの辞書データが記憶されていてもよい。 Next, switching of dictionary data will be described. The system control unit 201 sets dictionary data in the subject detection unit 204 based on the likelihood of each dictionary data estimated by the dictionary estimation unit 205. In this embodiment, the system control unit 201 sets the order and period of dictionary data to be set in the subject detection unit 204 based on the likelihood of each dictionary data. Figure 4 is a diagram showing an example of switching of dictionary data. In the example of Figure 4, it is assumed that three dictionary data (dictionary data 1 to dictionary data 3) are stored in the dictionary data storage unit 206. However, many dictionary data may be stored in the dictionary data storage unit 206.

図４（ａ）は、被写体検出部２０４に設定する辞書データを逐次的に切り替える例を示す。図４に示される「辞書ｎ（ｎは１～３）の検出」は、被写体検出部２０４が、辞書ｎを用いて被写体検出を行う時間を示す。「画像」は、画像データを取得する時間を示す。図４（ａ）の例では、辞書データ１、辞書データ２、辞書データ３の周期で、辞書データが逐次的に均等に切り替えられる。ここで、目的の被写体に対応する辞書データが、辞書データ３であるとする。この場合、辞書データ３を用いた被写体検出は、辞書データ１を用いた被写体検出および辞書データ２を用いた被写体検出の後に行われる。従って、辞書データ３を用いた被写体検出が行われるまでの時間が長くなる。特に、辞書データ記憶部２０６に記憶されている辞書データの数が多くなると、目的の被写体に対応する辞書データを用いた被写体検出が行われるまでの時間が非常に長くなる可能性がある。 FIG. 4A shows an example of sequentially switching dictionary data set in the subject detection unit 204. "Detection of dictionary n (n is 1 to 3)" in FIG. 4 indicates the time when the subject detection unit 204 performs subject detection using dictionary n. "Image" indicates the time when image data is acquired. In the example of FIG. 4A, the dictionary data is sequentially and evenly switched in cycles of dictionary data 1, dictionary data 2, and dictionary data 3. Here, it is assumed that the dictionary data corresponding to the target subject is dictionary data 3. In this case, subject detection using dictionary data 3 is performed after subject detection using dictionary data 1 and subject detection using dictionary data 2. Therefore, it takes a long time until subject detection using dictionary data 3 is performed. In particular, when the number of dictionary data stored in the dictionary data storage unit 206 increases, it may take a very long time until subject detection using dictionary data corresponding to the target subject is performed.

そこで、本実施形態では、システム制御部２０１は、辞書データ記憶部２０６に記憶されている各辞書データのそれぞれの尤度に応じて、辞書データの切り替えを行う。システム制御部２０１は、辞書推定部２０５が推定した尤度が所定の閾値未満の辞書データを、被写体検出部２０４に設定する対象から除外する。そして、システム制御部２０１は、辞書データの切り替えを行う際、推定された尤度が高い辞書データを被写体検出部２０４に設定する順番の優先度を高くするとともに切り替えの周期を短くする。 Therefore, in this embodiment, the system control unit 201 switches the dictionary data according to the likelihood of each dictionary data stored in the dictionary data storage unit 206. The system control unit 201 excludes dictionary data whose likelihood estimated by the dictionary estimation unit 205 is less than a predetermined threshold from the targets to be set in the subject detection unit 204. When switching the dictionary data, the system control unit 201 increases the priority of the order in which dictionary data with a high estimated likelihood is set in the subject detection unit 204 and shortens the switching cycle.

つまり、システム制御部２０１は、辞書データの尤度が高くなるに応じて、被写体検出部２０４に設定する順番が早くなるように、切り替えのスケジューリングを行う。また、システム制御部２０１は、辞書データの尤度が高くなるに応じて、被写体検出部２０４に設定する辞書データの切り替えの周期が短くなるように、切り替えのスケジューリングを行う。システム制御部２０１は、被写体検出部２０４に設定する辞書データの順番の制御と、辞書データの切り替えの周期の制御とのうち何れか一方のみを行ってもよいし、両方を行ってもよい。本実施形態では、順番の制御と周期の制御との両方が行われるものとして説明する。 In other words, the system control unit 201 schedules switching so that the order of setting the dictionary data to the subject detection unit 204 is earlier as the likelihood of the dictionary data increases. Also, the system control unit 201 schedules switching so that the cycle of switching the dictionary data to be set to the subject detection unit 204 is shorter as the likelihood of the dictionary data increases. The system control unit 201 may control only one of the order of the dictionary data to be set to the subject detection unit 204 and the cycle of switching the dictionary data, or may control both. In this embodiment, it is described that both order control and cycle control are performed.

図４（ｂ）は、システム制御部２０１が行う辞書データの切り替えの制御の一例を示す。辞書推定部２０５は、辞書データ１の尤度が０％、辞書データ２の尤度が４０％、辞書データ３の尤度が８０％であると推定したとする。ここで、上述した所定の閾値が３０％として設定されているとする。システム制御部２０１は、尤度が所定の閾値未満である辞書データ１を、被写体検出部２０４に設定する対象から除外する。このため、システム制御部２０１は、尤度が所定の閾値未満である辞書データ以外の辞書データ２と辞書データ３との切り替え制御が行われる。尤度の低い辞書データが被写体検出部２０４に設定する対象から除外されることで、目的とする被写体の検出に適していない辞書データが用いられることが抑制される。これにより、被写体の誤検出が低減される。 Figure 4 (b) shows an example of dictionary data switching control performed by the system control unit 201. The dictionary estimation unit 205 estimates that the likelihood of dictionary data 1 is 0%, the likelihood of dictionary data 2 is 40%, and the likelihood of dictionary data 3 is 80%. Here, it is assumed that the above-mentioned predetermined threshold is set to 30%. The system control unit 201 excludes dictionary data 1 whose likelihood is less than the predetermined threshold from the targets to be set in the subject detection unit 204. For this reason, the system control unit 201 controls switching between dictionary data 2 and dictionary data 3 other than the dictionary data whose likelihood is less than the predetermined threshold. By excluding dictionary data with low likelihood from the targets to be set in the subject detection unit 204, the use of dictionary data that is not suitable for detecting the target subject is suppressed. This reduces erroneous detection of the subject.

そして、図４（ｂ）に示されるように、システム制御部２０１は、３つの辞書データのうち最も尤度が高い辞書データ３を、最初に、被写体検出部２０４に設定する。そして、システム制御部２０１は、次に尤度が高い辞書データ２を、被写体検出部２０４に設定する。これにより、システム制御部２０１は、辞書データの切り替えについての順番の制御を行う。そして、システム制御部２０１は、システム制御部２０１は、辞書データ３を連続して被写体検出部２０４に設定する。これにより、システム制御部２０１は、辞書データの切り替えについての周期の制御を行う。つまり、システム制御部２０１は、被写体検出部２０４に設定する辞書データを切り替える際、尤度が高い辞書データ３の順番を早くし、切り替えの周期を短くする。推定された尤度が最も高い辞書データ３は、目的とする被写体の検出に適している可能性が高い。システム制御部２０１が、上述した順番の制御および周期の制御を行うことで、目的とする被写体を検出するまでの時間を短縮することができる。 As shown in FIG. 4B, the system control unit 201 first sets the dictionary data 3 with the highest likelihood of the three dictionary data to the object detection unit 204. Then, the system control unit 201 sets the dictionary data 2 with the next highest likelihood to the object detection unit 204. As a result, the system control unit 201 controls the order of switching the dictionary data. Then, the system control unit 201 sets the dictionary data 3 to the object detection unit 204 consecutively. As a result, the system control unit 201 controls the cycle of switching the dictionary data. In other words, when switching the dictionary data to be set in the object detection unit 204, the system control unit 201 advances the order of the dictionary data 3 with the highest likelihood and shortens the switching cycle. The dictionary data 3 with the highest estimated likelihood is likely to be suitable for detecting the target object. The system control unit 201 controls the order and cycle as described above, thereby shortening the time until the target object is detected.

また、システム制御部２０１は、各辞書データの尤度に応じて、被写体検出部２０４が被写体を検出するための閾値（信頼度の閾値）を変更してもよい。例えば、被写体検出部２０４に設定された辞書データの尤度が低い場合、システム制御部２０１は、被写体検出のための信頼度の閾値を高く設定する。これにより、尤度が低い辞書データを用いた被写体の検出がされにくくなる。一方、被写体検出部２０４に設定された辞書データの尤度が高い場合、システム制御部２０１は、被写体検出のための信頼度の閾値を高く設定する。これにより、被写体検出部２０４は、尤度が高い辞書データを用いて、被写体の検出を行いやすくなる。 The system control unit 201 may also change the threshold (reliability threshold) used by the subject detection unit 204 to detect a subject, depending on the likelihood of each dictionary data. For example, if the likelihood of the dictionary data set in the subject detection unit 204 is low, the system control unit 201 sets the reliability threshold for subject detection high. This makes it difficult to detect a subject using dictionary data with low likelihood. On the other hand, if the likelihood of the dictionary data set in the subject detection unit 204 is high, the system control unit 201 sets the reliability threshold for subject detection high. This makes it easier for the subject detection unit 204 to detect a subject using dictionary data with high likelihood.

また、システム制御部２０１は、推定された尤度が高い辞書データを被写体検出部２０４に設定したにもかかわらず、被写体検出部２０４が被写体を検出できなかった場合、被写体検出部２０４に入力する画像を加工してもよい。加工する手法としては、例えば、被写体検出をしやすくするためのトリミング処理が適用される。システム制御部２０１は、入力された画像データを、画像中心基準でトリミング処理を行ってもよい。これにより、画像データにおける被写体のサイズを大きくすることができ、被写体検出部２０４は、被写体を検出しやすくなる。 Furthermore, if the subject detection unit 204 fails to detect a subject despite having set dictionary data with a high estimated likelihood in the subject detection unit 204, the system control unit 201 may process the image to be input to the subject detection unit 204. As a processing method, for example, a trimming process is applied to make subject detection easier. The system control unit 201 may perform a trimming process on the input image data based on the center of the image. This makes it possible to increase the size of the subject in the image data, making it easier for the subject detection unit 204 to detect the subject.

上述したように、辞書推定部２０５は、撮像された画像データに、複数種類の被写体の何れの被写体が存在するかを推定するために、辞書データごとの被写体有無の尤度を推定する。また、被写体検出部２０４は、画像データの中の何れの位置に被写体が存在するかを推定する。このため、辞書推定部２０５の推定結果よりも被写体検出部２０４の推定結果の方が、時間的に変化しやすい。そこで、被写体検出部２０４の処理周期は、辞書推定部２０５の処理周期よりも短いことが好ましい。 As described above, the dictionary estimation unit 205 estimates the likelihood of the presence or absence of a subject for each dictionary data in order to estimate which of multiple types of subjects is present in the captured image data. Furthermore, the subject detection unit 204 estimates the position in the image data at which the subject is present. For this reason, the estimation results of the subject detection unit 204 are more likely to change over time than the estimation results of the dictionary estimation unit 205. Therefore, it is preferable that the processing cycle of the subject detection unit 204 is shorter than the processing cycle of the dictionary estimation unit 205.

上述したように、被写体検出部２０４および辞書推定部２０５は、機械学習により学習されたＣＮＮ（学習済みのＣＮＮ）により構成される。被写体検出部２０４は、画像データを入力として、被写体の位置やサイズ、信頼度等を推定し、推定した情報を出力する。辞書推定部２０５は、画像データを入力として、各辞書データの尤度を推定し、推定した尤度の情報を出力する。ＣＮＮは、例えば、畳み込み層とプーリング層とが交互に積層された層構造に、全結合層および出力層が結合されたネットワークであってもよい。この場合、ＣＮＮの学習としては、例えば、誤差逆伝搬法等が適用され得る。また、ＣＮＮは、特徴検出層（Ｓ層）と特徴統合層（Ｃ層）とをセットとした、ネオコグニトロンのＣＮＮであってもよい。この場合、ＣＮＮの学習としては、「Ａｄｄ－ｉｆＳｉｌｅｎｔ」と称される学習手法が適用され得る。 As described above, the subject detection unit 204 and the dictionary estimation unit 205 are composed of a CNN trained by machine learning (trained CNN). The subject detection unit 204 takes image data as input, estimates the position, size, reliability, etc. of the subject, and outputs the estimated information. The dictionary estimation unit 205 takes image data as input, estimates the likelihood of each dictionary data, and outputs information on the estimated likelihood. The CNN may be, for example, a network in which a fully connected layer and an output layer are connected to a layer structure in which convolution layers and pooling layers are alternately stacked. In this case, for example, an error backpropagation method or the like may be applied as the learning of the CNN. The CNN may also be a neocognitron CNN that includes a feature detection layer (S layer) and a feature integration layer (C layer). In this case, a learning method called "Add-if Silent" may be applied as the learning of the CNN.

なお、被写体検出部２０４のＣＮＮと辞書推定部２０５のＣＮＮとは、学習済み係数パラメータ等が異なるものであってよい。この場合、被写体検出部２０４のＣＮＮと辞書推定部２０５のＣＮＮとで学習済み係数パラメータが切り替えられてもよい。また、被写体検出部２０４のＣＮＮと辞書推定部２０５のＣＮＮとは、異なるネットワーク構成であってもよい。 The CNN of the subject detection unit 204 and the CNN of the dictionary estimation unit 205 may have different learned coefficient parameters, etc. In this case, the learned coefficient parameters may be switched between the CNN of the subject detection unit 204 and the CNN of the dictionary estimation unit 205. In addition, the CNN of the subject detection unit 204 and the CNN of the dictionary estimation unit 205 may have different network configurations.

また、被写体検出部２０４および辞書推定部２０５には、学習済みのＣＮＮ以外の任意の学習済みモデルが用いられてもよい。例えば、サポートベクタマシンや決定木等の機械学習により生成される学習済みモデルが、被写体検出部２０４および辞書推定部２０５に適用されてもよい。また、被写体検出部２０４および辞書推定部２０５は、機械学習により生成される学習済みモデルでなくてもよい。例えば、被写体検出部２０４には、機械学習を用いない任意の被写体検出手法が適用されてもよい。辞書推定部２０５には、尤度を出力する関数等が適用されてもよい。 In addition, any trained model other than a trained CNN may be used for the subject detection unit 204 and the dictionary estimation unit 205. For example, a trained model generated by machine learning such as a support vector machine or a decision tree may be applied to the subject detection unit 204 and the dictionary estimation unit 205. In addition, the subject detection unit 204 and the dictionary estimation unit 205 do not have to be trained models generated by machine learning. For example, any subject detection method that does not use machine learning may be applied to the subject detection unit 204. A function that outputs a likelihood, etc. may be applied to the dictionary estimation unit 205.

以上説明したように、辞書推定部２０５の推定により、複数の辞書データから、目的の被写体に適した辞書データを被写体検出部２０４に設定することができる。そして、被写体検出部２０４は、目的の被写体に適した辞書データを用いて、被写体検出を行う。これにより、目的の被写体を検出するまでの時間が短縮され、被写体の誤検出を抑制することができる。その結果、被写体検出を行う際の辞書データの選択を効率的に行うことができる。 As described above, the dictionary estimation unit 205 makes the estimation, and from multiple dictionary data, it is possible to set dictionary data suitable for the target subject in the subject detection unit 204. Then, the subject detection unit 204 performs subject detection using dictionary data suitable for the target subject. This shortens the time required to detect the target subject, and makes it possible to suppress erroneous detection of the subject. As a result, it is possible to efficiently select dictionary data when performing subject detection.

以上、本発明の好ましい実施の形態について説明したが、本発明は上述した各実施の形態に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。本発明は、上述の各実施の形態の１以上の機能を実現するプログラムを、ネットワークや記憶媒体を介してシステムや装置に供給し、そのシステム又は装置のコンピュータの１つ以上のプロセッサがプログラムを読み出して実行する処理でも実現可能である。また、本発明は、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Although the preferred embodiments of the present invention have been described above, the present invention is not limited to the above-mentioned embodiments, and various modifications and variations are possible within the scope of the gist of the present invention. The present invention can also be realized by supplying a program that realizes one or more functions of the above-mentioned embodiments to a system or device via a network or storage medium, and having one or more processors of a computer in the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., an ASIC) that realizes one or more functions.

１００撮像装置
１１１撮像センサ
２０１システム制御部
２０４被写体検出部
２０５辞書推定部
２０６辞書データ記憶部 100 Imaging device 111 Imaging sensor 201 System control unit 204 Subject detection unit 205 Dictionary estimation unit 206 Dictionary data storage unit

Claims

an estimation means for estimating a likelihood of the presence or absence of a subject for each of a plurality of dictionary data used for detecting the subject based on captured image data;
an object detection means for detecting an object from the image data by using dictionary data set according to the estimated likelihood;
A control means for switching between a plurality of dictionary data to be set in the object detection means;
Equipped with
The image processing device according to the present invention, wherein the control means switches between a plurality of dictionary data other than the dictionary data whose likelihood is less than a predetermined threshold value .

an estimation means for estimating a likelihood of the presence or absence of a subject for each of a plurality of dictionary data used for detecting the subject based on captured image data;
an object detection means for detecting an object from the image data by using dictionary data set according to the estimated likelihood;
A control means for switching between a plurality of dictionary data to be set in the object detection means ;
Equipped with
The image processing device according to claim 1, wherein the control means changes either one or both of an order and a cycle for switching between a plurality of dictionary data sets set in the subject detection means in accordance with the likelihood .

2 . The image processing apparatus according to claim 1 , wherein the control means changes, in accordance with the likelihood, either one or both of an order and a cycle for switching between the plurality of dictionary data sets set in the subject detection means.

4. The image processing apparatus according to claim 1 , wherein the control means increases the priority of the dictionary data in the switching order as the likelihood increases.

4. The image processing apparatus according to claim 1 , wherein the control means shortens a switching period for the dictionary data as the likelihood increases.

4. The image processing apparatus according to claim 1 , wherein the control means increases the priority of the switching order of the dictionary data and shortens a switching cycle as the likelihood increases.

7. The image processing apparatus according to claim 1 , wherein the control means changes a threshold value of a reliability when the object detection means detects the object, in accordance with the likelihood.

8. The image processing apparatus according to claim 1, further comprising: a trimming process for the image data when the object is not detected by the object detection means.

9. The image processing apparatus according to claim 1, wherein a processing cycle of said subject detection means is shorter than a processing cycle of said estimation means.

The estimation means estimates the likelihood using a trained model;
The image processing device according to claim 1 , wherein the object detection means detects the object using a trained model different from the trained model.

11. The image processing device according to claim 10 , wherein the trained model of the estimation means and the trained model of the object detection means are convolutional neural networks.

An imaging unit;
An image processing device according to any one of claims 1 to 11 ,
An imaging device comprising:

estimating a likelihood of the presence or absence of an object for each of a plurality of dictionary data used for detecting the object based on the captured image data;
detecting a subject from the image data by using dictionary data set according to the estimated likelihood;
a control step of switching between a plurality of dictionary data sets to be set in the step of detecting a subject from the image data;
Equipped with
The image processing method , wherein the control step switches between a plurality of dictionary data other than the dictionary data whose likelihood is less than a predetermined threshold value .

estimating a likelihood of the presence or absence of an object for each of a plurality of dictionary data used for detecting the object based on the captured image data;
detecting a subject from the image data by using dictionary data set according to the estimated likelihood;
a control step of switching between a plurality of dictionary data sets to be set in the step of detecting a subject from the image data;
Equipped with
The image processing method is characterized in that the control step changes either one or both of the order and period of switching between multiple dictionary data set in a step of detecting a subject from the image data, depending on the likelihood.

12. A program for causing a computer to execute each unit of the image processing apparatus according to claim 1 .