JP7092615B2

JP7092615B2 - Shadow detector, shadow detection method, shadow detection program, learning device, learning method, and learning program

Info

Publication number: JP7092615B2
Application number: JP2018157457A
Authority: JP
Inventors: 陽介野中
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2018-08-24
Filing date: 2018-08-24
Publication date: 2022-06-28
Anticipated expiration: 2038-08-24
Also published as: JP2020030750A

Description

本発明は、撮影画像において影が撮影されている影領域を検出する影検出技術、および撮影画像内の局所領域について画像特徴から影度合いを出力する学習済モデルを得る学習技術に関する。 The present invention relates to a shadow detection technique for detecting a shadow region in which a shadow is captured in a captured image, and a learning technique for obtaining a trained model that outputs a degree of shadow from an image feature for a local region in the captured image.

日陰と日向が混在する撮影画像から人等の物体を検知する場合、撮影画像中の日陰部分でのコントラスト不足によって検知精度が低下し得る。そこで、検知精度の低下を防ぐために、撮影画像中の影領域を検出して影領域のコントラストを補正する等の対策が行われている。 When an object such as a person is detected from a captured image in which shade and sun are mixed, the detection accuracy may decrease due to insufficient contrast in the shaded portion of the captured image. Therefore, in order to prevent a decrease in the detection accuracy, measures such as detecting a shadow area in the captured image and correcting the contrast of the shadow area are taken.

例えば、特許文献１に記載の対象物検知装置では、撮影画像中の人候補領域の輝度分布を分析し、単峰性且つ背景画像の輝度分布と類似している場合に人候補領域の全体を影領域と判定し、または双峰性の輝度分布にて低輝度側の山の輝度値を有する領域を影領域と判定して、影領域にコントラスト補正を施すことにより検出精度の低下を防止していた。 For example, the object detection device described in Patent Document 1 analyzes the luminance distribution of a human candidate region in a captured image, and if it is monomodal and similar to the luminance distribution of a background image, the entire human candidate region is used. It is determined that it is a shadow area, or the area having the brightness value of the mountain on the low brightness side in the bimodal brightness distribution is determined as the shadow area, and the contrast correction is applied to the shadow area to prevent the detection accuracy from deteriorating. Was there.

特開２０１１－０２８３４８号公報Japanese Unexamined Patent Publication No. 2011-028348

しかしながら、従来の技術では、背景が複雑な空間を撮影した撮影画像に対して影領域の判定が難しくなることがあるという問題があった。 However, in the conventional technique, there is a problem that it may be difficult to determine a shadow region for a photographed image in which a space having a complicated background is photographed.

例えば、２種類の素材からなる背景では、素材の相違に起因して輝度分布が双峰性となり得る。つまり、影が生じていなくても輝度分布が双峰性となるため、２種類の素材のうちの低輝度な方の領域が影領域と誤判定されてしまう。 For example, in a background composed of two types of materials, the luminance distribution may be bimodal due to the difference in the materials. That is, since the luminance distribution is bimodal even if no shadow is generated, the lower luminance region of the two types of materials is erroneously determined as the shadow region.

本発明は上記問題を解決するためになされたものであり、反射特性が相違する複数の背景構成物が存在するといった背景が複雑な空間を撮影した撮影画像であっても当該撮影画像から高精度に影領域を検出可能な影検出技術を提供することを目的とする。 The present invention has been made to solve the above problems, and even if a photographed image is captured in a space having a complicated background such as the existence of a plurality of background components having different reflection characteristics, the captured image is highly accurate. It is an object of the present invention to provide a shadow detection technique capable of detecting a shadow region.

また、本発明は、背景が複雑な空間を撮影した撮影画像であっても当該撮影画像における局所領域の画像に影が撮影されている可能性の高さを高精度に出力できる学習モデルを学習させる学習技術を提供することを別の目的とする。 Further, the present invention learns a learning model that can output with high accuracy the high possibility that a shadow is captured in the image of a local region in the captured image even if the captured image is captured in a space having a complicated background. Another purpose is to provide learning techniques to make them.

（１）本発明に係る影検出装置は、所定の空間を撮影した撮影画像において影が撮影されている影領域を検出する装置であって、前記撮影画像に設定された局所領域における画像特徴を入力され、当該局所領域に影が撮影されている可能性の高さを表す影度合いを出力する学習済モデルであって、前記空間を撮影した学習用撮影画像に設定された局所領域における画像特徴及び当該局所領域について推定された前記影度合いである学習用影度合いを用いた学習が行われた学習済モデルを記憶している学習済モデル記憶手段と、前記撮影画像の前記局所領域における画像特徴を前記学習済モデルに入力して得られる前記影度合いを所定の基準と比較し、前記影度合いが前記基準を超える局所領域を前記影領域と判定する影判定手段と、を備える。 (1) The shadow detection device according to the present invention is a device that detects a shadow region in which a shadow is captured in a captured image captured in a predetermined space, and exhibits image features in a local region set in the captured image. It is a trained model that is input and outputs the degree of shadow indicating the high possibility that a shadow is shot in the local area, and is an image feature in the local area set in the learning image taken in the space. A trained model storage means that stores a trained model that has been trained using the shadow degree for learning, which is the shadow degree estimated for the local region, and an image feature of the captured image in the local region. Is provided in the trained model, the shadow degree is compared with a predetermined reference, and a shadow determination means for determining a local region in which the shadow degree exceeds the reference is defined as the shadow region.

（２）上記（１）に記載の影検出装置において、前記学習済モデル記憶手段は、前記撮影画像に撮影され得る背景構成物の反射特性が類似する特性類似領域ごとに前記学習が行われた、前記特性類似領域ごとの前記学習済モデルを記憶する構成とすることができる。 (2) In the shadow detection device according to (1) above, the trained model storage means has been trained for each characteristic-similar region in which the reflection characteristics of the background components that can be captured in the captured image are similar. , The trained model can be stored for each characteristic-similar region.

（３）上記（２）に記載の影検出装置において、前記特性類似領域を記憶している背景情報記憶手段、をさらに備え、前記影判定手段は、前記局所領域の画像特徴を当該局所領域に対応する前記特性類似領域の前記学習済モデルに入力して前記影度合いを得る構成とすることができる。 (3) The shadow detection device according to (2) above further includes a background information storage means for storing the characteristic-similar region, and the shadow determination means transfers an image feature of the local region to the local region. The shadow degree can be obtained by inputting to the trained model of the corresponding characteristic-similar region.

（４）上記（１）に記載の影検出装置において、前記撮影画像に撮影され得る背景構成物の反射特性が類似する特性類似領域を記憶している背景情報記憶手段、をさらに備え、前記学習済モデル記憶手段は、前記学習用撮影画像に設定された前記局所領域についての前記画像特徴及び前記学習用影度合いに加えて、当該局所領域が帰属する前記特性類似領域を示す帰属情報も用いて前記学習が行われた前記学習済モデルを記憶し、前記影判定手段は、前記局所領域の画像特徴及び当該局所領域についての前記帰属情報を前記学習済モデルに入力して前記影度合いを得る構成とすることができる。 (4) In the shadow detection device according to (1) above, the learning is further provided with a background information storage means for storing a characteristic-similar region having similar reflection characteristics of a background component that can be captured in the captured image. In addition to the image features and the degree of shadow for learning about the local region set in the captured image for learning, the completed model storage means also uses attribution information indicating the characteristic-like region to which the local region belongs. The trained model is stored, and the shadow determination means inputs the image features of the local region and the attribution information about the local region into the trained model to obtain the shadow degree. Can be.

（５）本発明に係る学習装置は、所定の空間を撮影した撮影画像に設定された局所領域における画像特徴を入力されて当該局所領域に影が撮影されている可能性の高さを表す影度合いを出力する学習モデルを学習させる装置であって、前記学習モデルを記憶する学習モデル記憶手段と、少なくとも、前記空間を撮影した学習用撮影画像に設定された局所領域における画像特徴及び当該局所領域について推定された前記影度合いである学習用影度合いを記憶している学習用データ記憶手段と、前記学習モデルに少なくとも前記学習用撮影画像における前記局所領域の画像特徴を入力し、得られた前記影度合いの当該局所領域の前記学習用影度合いに対する誤差に基づいて前記学習モデルを更新する学習を行う学習手段と、を備える。 (5) In the learning device according to the present invention, an image feature in a local region set in a captured image captured in a predetermined space is input, and a shadow indicating a high possibility that a shadow is captured in the local region. A device for training a learning model that outputs a degree, and is a learning model storage means for storing the learning model, and at least an image feature in a local region set in a captured image for learning in which the space is captured, and the local region. The learning data storage means for storing the learning shadow degree, which is the estimated shadow degree, and the learning model obtained by inputting at least the image features of the local region in the learning photographed image. A learning means for updating the learning model based on an error of the shadow degree with respect to the learning shadow degree of the local region is provided.

（６）上記（５）に記載の学習装置において、前記学習モデル記憶手段は、前記撮影画像に撮影され得る背景構成物の反射特性が類似する特性類似領域ごとに前記学習モデルを記憶し、前記学習手段は、前記特性類似領域ごとの前記学習モデルについて前記学習を行う構成とすることができる。 (6) In the learning device according to (5) above, the learning model storage means stores the learning model for each characteristic-similar region in which the reflection characteristics of the background components that can be captured in the captured image are similar, and the learning model is stored. The learning means may be configured to perform the learning for the learning model for each characteristic-similar region.

（７）本発明に係る影検出方法は、所定の空間を撮影した撮影画像において影が撮影されている影領域を検出する方法であって、前記撮影画像に設定された局所領域における画像特徴を入力され、当該局所領域に影が撮影されている可能性の高さを表す影度合いを出力する学習済モデルであって、前記空間を撮影した学習用撮影画像に設定された局所領域における画像特徴及び当該局所領域について推定された前記影度合いである学習用影度合いを用いた学習が行われた学習済モデルを用い、前記撮影画像の前記局所領域における画像特徴を前記学習済モデルに入力する入力ステップと、前記学習済モデルから出力される前記影度合いを所定の基準と比較し、前記影度合いが前記基準を超える局所領域を前記影領域と判定する影判定ステップと、を含む。 (7) The shadow detection method according to the present invention is a method of detecting a shadow region in which a shadow is captured in a captured image captured in a predetermined space, and exhibits image features in a local region set in the captured image. It is a trained model that is input and outputs the degree of shadow indicating the high possibility that a shadow is shot in the local area, and is an image feature in the local area set in the learning image taken in the space. And input to input the image feature in the local area of the captured image into the trained model using the trained model in which the learning using the learning shadow degree which is the shadow degree estimated for the local region is used. The step includes a shadow determination step of comparing the shadow degree output from the trained model with a predetermined reference and determining a local region in which the shadow degree exceeds the reference as the shadow region.

（８）本発明に係る影検出プログラムは、所定の空間を撮影した撮影画像において影が撮影されている影領域を検出する処理をコンピュータに行わせるためのプログラムであって、当該コンピュータを、前記撮影画像に設定された局所領域における画像特徴を入力され、当該局所領域に影が撮影されている可能性の高さを表す影度合いを出力する学習済モデルであって、前記空間を撮影した学習用撮影画像に設定された局所領域における画像特徴及び当該局所領域について推定された前記影度合いである学習用影度合いを用いた学習が行われた学習済モデルを記憶している学習済モデル記憶手段、及び、前記撮影画像の前記局所領域における画像特徴を前記学習済モデルに入力して得られる前記影度合いを所定の基準と比較し、前記影度合いが前記基準を超える局所領域を前記影領域と判定する影判定手段、として機能させる。 (8) The shadow detection program according to the present invention is a program for causing a computer to perform a process of detecting a shadow region in which a shadow is captured in a captured image captured in a predetermined space, and the computer is used as described above. It is a trained model that inputs the image features in the local area set in the captured image and outputs the degree of shadow indicating the high possibility that a shadow is captured in the local area. A trained model storage means that stores a trained model that has been trained using the image features in the local region set in the captured image and the training shadow degree, which is the estimated shadow degree for the local region. And, the shadow degree obtained by inputting the image feature in the local region of the captured image into the trained model is compared with a predetermined reference, and the local region where the shadow degree exceeds the reference is referred to as the shadow region. It functions as a shadow determination means for determination.

（９）本発明に係る学習方法は、所定の空間を撮影した撮影画像に設定された局所領域における画像特徴を入力されて当該局所領域に影が撮影されている可能性の高さを表す影度合いを出力する学習モデルを学習させる方法であって、前記学習モデルを記憶する学習モデル記憶手段と、少なくとも、前記空間を撮影した学習用撮影画像に設定された局所領域における画像特徴及び当該局所領域について推定された前記影度合いである学習用影度合いを記憶している学習用データ記憶手段と、を用い、前記学習モデルに少なくとも前記学習用撮影画像における前記局所領域の画像特徴を入力する入力ステップと、前記学習モデルから出力される前記影度合いの当該局所領域の前記学習用影度合いに対する誤差に基づいて前記学習モデルを更新する学習を行う学習ステップと、を含む。 (9) In the learning method according to the present invention, an image feature in a local region set in a captured image captured in a predetermined space is input, and a shadow representing a high possibility that a shadow is captured in the local region. It is a method of training a learning model that outputs a degree, and is a learning model storage means for storing the learning model, and at least an image feature in a local region set in a captured image for learning in which the space is captured, and the local region. An input step of inputting at least an image feature of the local region in the captured image for learning into the learning model by using a learning data storage means for storing the learning shadow degree which is the estimated shadow degree. And a learning step of performing learning to update the learning model based on the error of the image degree output from the learning model with respect to the learning shadow degree of the local region.

（１０）本発明に係る学習プログラムは、所定の空間を撮影した撮影画像に設定された局所領域における画像特徴を入力されて当該局所領域に影が撮影されている可能性の高さを表す影度合いを出力する学習モデルを学習させる処理をコンピュータに行わせるためのプログラムであって、当該コンピュータを、前記学習モデルを記憶する学習モデル記憶手段、少なくとも、前記空間を撮影した学習用撮影画像に設定された局所領域における画像特徴及び当該局所領域について推定された前記影度合いである学習用影度合いを記憶している学習用データ記憶手段、及び、前記学習モデルに少なくとも前記学習用撮影画像における前記局所領域の画像特徴を入力し、得られた前記影度合いの当該局所領域の前記学習用影度合いに対する誤差に基づいて前記学習モデルを更新する学習を行う学習手段、として機能させる。 (10) In the learning program according to the present invention, an image feature in a local region set in a captured image captured in a predetermined space is input, and a shadow representing a high possibility that a shadow is captured in the local region. It is a program for causing a computer to perform a process of training a learning model that outputs a degree, and the computer is set as a learning model storage means for storing the learning model, at least, a photographed image for learning in which the space is photographed. A learning data storage means that stores an image feature in a local region and a learning shadow degree that is an estimated shadow degree for the local region, and the local area in at least the learning captured image in the learning model. The image feature of the region is input, and the image feature is made to function as a learning means for learning to update the learning model based on the error of the obtained shadow degree with respect to the learning shadow degree of the local region.

本発明の影検出技術によれば、撮影画像における各局所領域が影領域であるか否かを、撮影画像を用いた学習に基づいて判定するので、背景が複雑な空間を撮影した撮影画像であっても影領域を高精度に検出できる。 According to the shadow detection technique of the present invention, whether or not each local region in the captured image is a shadow region is determined based on learning using the captured image, so that the captured image in which the background is complicated is captured. Even if there is, the shadow area can be detected with high accuracy.

また、本発明の学習技術は、撮影画像を用いて撮影画像における局所領域の画像に影が撮影されている可能性の高さを表す影度合いを学習するので、背景が複雑な空間を撮影した撮影画像であってもその局所領域の影度合いを高精度に出力可能な学習モデルを学習させることができる。 Further, the learning technique of the present invention learns the degree of shadow indicating the high possibility that a shadow is captured in the image of a local region in the captured image by using the captured image, so that a space having a complicated background is photographed. Even if it is a captured image, it is possible to train a learning model that can output the degree of shadow of the local region with high accuracy.

本発明の実施形態に係る画像監視装置の概略の構成を示すブロック図である。It is a block diagram which shows the schematic structure of the image monitoring apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る画像監視装置が学習装置として機能する際の概略の機能ブロック図である。It is a schematic functional block diagram when the image monitoring apparatus which concerns on embodiment of this invention functions as a learning apparatus. 反射特性マップの例を示す模式図である。It is a schematic diagram which shows the example of the reflection characteristic map. 監視空間の学習用撮影画像およびレンダリング画像それぞれの一例を示す模式図である。It is a schematic diagram which shows an example of each of the photographed image for learning and the rendered image of a surveillance space. 学習用データを説明する模式図である。It is a schematic diagram explaining the learning data. 本発明の実施形態に係る画像監視装置が影検出装置として機能する際の概略の機能ブロック図である。It is a schematic functional block diagram when the image monitoring apparatus which concerns on embodiment of this invention functions as a shadow detection apparatus. 本発明の実施形態に係る画像監視装置の動作を説明する概略のフロー図である。It is a schematic flow diagram explaining the operation of the image monitoring apparatus which concerns on embodiment of this invention. 影判定モデル学習処理の概略のフロー図である。It is a schematic flow chart of a shadow judgment model learning process. 影判定処理の概略のフロー図である。It is a schematic flow chart of a shadow determination process.

以下、本発明の実施の形態（以下実施形態という）である画像監視装置１について、図面に基づいて説明する。画像監視装置１は所定の空間（監視空間）が撮影された画像（撮影画像）から当該空間における人や不審物等の監視対象の有無等を解析する。特に、画像監視装置１は、撮影画像における影領域を検出する本発明に係る影検出装置を含んで構成され、監視対象を検知する画像処理において影領域の情報を利用する。また、画像監視装置１は本発明に係る学習装置を備え、影検出装置で用いる学習済モデルを生成することができる。 Hereinafter, the image monitoring device 1 according to the embodiment of the present invention (hereinafter referred to as the embodiment) will be described with reference to the drawings. The image monitoring device 1 analyzes the presence or absence of a monitoring target such as a person or a suspicious object in a predetermined space (surveillance space) from an image (photographed image) taken. In particular, the image monitoring device 1 is configured to include a shadow detecting device according to the present invention for detecting a shadow region in a captured image, and uses information on the shadow region in image processing for detecting a monitoring target. Further, the image monitoring device 1 is provided with the learning device according to the present invention, and can generate a trained model used in the shadow detection device.

［画像監視装置の構成］
図１は画像監視装置１の概略の構成を示すブロック図である。画像監視装置１はカメラ２、通信部３、記憶部４、画像処理部５および報知部６からなる。 [Configuration of image monitoring device]
FIG. 1 is a block diagram showing a schematic configuration of the image monitoring device 1. The image monitoring device 1 includes a camera 2, a communication unit 3, a storage unit 4, an image processing unit 5, and a notification unit 6.

カメラ２は監視カメラであり、通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して撮影画像を生成し、撮影画像を順次、画像処理部５に入力する撮影手段である。例えば、カメラ２は、監視空間であるイベント会場の一角に設置されたポールに当該監視空間を俯瞰する所定の固定視野を有して設置され、監視空間をフレーム周期１秒で撮影してカラー画像を生成する。なお、カメラ２はカラー画像の代わりにモノクロ画像を生成してもよい。 The camera 2 is a surveillance camera, is connected to the image processing unit 5 via the communication unit 3, captures the surveillance space at predetermined time intervals to generate captured images, and sequentially inputs the captured images to the image processing unit 5. It is a means of photography. For example, the camera 2 is installed on a pole installed in a corner of an event venue, which is a monitoring space, with a predetermined fixed field of view overlooking the monitoring space, and the monitoring space is photographed with a frame period of 1 second to obtain a color image. To generate. The camera 2 may generate a monochrome image instead of the color image.

通信部３は通信回路であり、その一端が画像処理部５に接続され、他端がカメラ２および報知部６と接続される。通信部３はカメラ２から撮影画像を取得して画像処理部５に入力し、画像処理部５から入力された解析結果を報知部６へ出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5 and the other end of which is connected to the camera 2 and the notification unit 6. The communication unit 3 acquires a captured image from the camera 2 and inputs it to the image processing unit 5, and outputs the analysis result input from the image processing unit 5 to the notification unit 6.

例えば、カメラ２および報知部６がイベント会場内の監視センターに設置され、通信部３、記憶部４および画像処理部５が遠隔地の画像解析センターに設置される場合、通信部３とカメラ２、および通信部３と報知部６をそれぞれインターネット回線にて接続し、通信部３と画像処理部５はバスで接続する構成とすることができる。その他、例えば各部を同一建屋内に設置する場合は、通信部３とカメラ２を同軸ケーブルまたはＬＡＮ（Local Area Network）で接続し、通信部３と報知部６はディスプレイケーブル、通信部３と画像処理部５はバスで接続するなど、各部の設置場所に応じた形態で適宜接続される。 For example, when the camera 2 and the notification unit 6 are installed in the monitoring center in the event venue, and the communication unit 3, the storage unit 4, and the image processing unit 5 are installed in the image analysis center in a remote location, the communication unit 3 and the camera 2 are installed. , And the communication unit 3 and the notification unit 6 can be connected by an internet line, respectively, and the communication unit 3 and the image processing unit 5 can be connected by a bus. In addition, for example, when each unit is installed in the same building, the communication unit 3 and the camera 2 are connected by a coaxial cable or a LAN (Local Area Network), the communication unit 3 and the notification unit 6 are a display cable, and the communication unit 3 and an image. The processing unit 5 is appropriately connected in a form according to the installation location of each unit, such as being connected by a bus.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。記憶部４は画像処理部５と接続されて、画像処理部５との間でこれらの情報を入出力する。 The storage unit 4 is a memory device such as a ROM (Read Only Memory) and a RAM (Random Access Memory), and stores various programs and various data. The storage unit 4 is connected to the image processing unit 5 and inputs / outputs these information to and from the image processing unit 5.

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）等の演算装置で構成される。画像処理部５は記憶部４からプログラムを読み出して実行することにより各種処理手段・制御手段として動作し、必要に応じて、各種データを記憶部４から読み出し、生成したデータを記憶部４に記憶させる。また、画像処理部５は、通信部３経由でカメラ２から取得した撮影画像から監視空間における監視対象の有無や位置などに関する解析結果を生成し、通信部３を介して報知部６へ出力する。また、画像処理部５は撮影画像またはコントラスト補正した補正画像を報知部６へ出力してもよい。 The image processing unit 5 is composed of arithmetic units such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), and an MCU (Micro Control Unit). The image processing unit 5 operates as various processing means / control means by reading and executing a program from the storage unit 4, reading various data from the storage unit 4 as necessary, and storing the generated data in the storage unit 4. Let me. Further, the image processing unit 5 generates an analysis result regarding the presence / absence and position of a monitoring target in the monitoring space from the captured image acquired from the camera 2 via the communication unit 3, and outputs the analysis result to the notification unit 6 via the communication unit 3. .. Further, the image processing unit 5 may output a captured image or a corrected image with contrast correction to the notification unit 6.

報知部６は、液晶ディスプレイまたはＣＲＴ（Cathode Ray Tube）ディスプレイ等のディスプレイ装置であり、通信部３から入力された解析結果に含まれる監視対象の有無や位置等の情報を表示することによって監視員に報知する。報知部６には、さらに、注意喚起を強調するためにブザーやランプ等を含めることもできる。監視員は表示された解析結果や画像を視認して対処の要否等を判断し、必要に応じて対処員を急行させる等の対処を行う。 The notification unit 6 is a display device such as a liquid crystal display or a CRT (Cathode Ray Tube) display, and is a monitor by displaying information such as the presence / absence and position of a monitoring target included in the analysis result input from the communication unit 3. Notify to. The notification unit 6 may further include a buzzer, a lamp, or the like to emphasize the alert. The observer visually recognizes the displayed analysis results and images to determine the necessity of countermeasures, and takes measures such as rushing the responders as necessary.

なお、本実施形態においては、通信部３と画像処理部５の組に対してカメラ２が１台である画像監視装置１を例示するが、別の実施形態においては、通信部３と画像処理部５の組に対してカメラ２が２台以上接続された構成とすることもできる。その場合、通信部３は各カメラ２から撮影画像を時分割で受信し、画像処理部５は各カメラ２からの撮影画像を時分割処理または並列処理する。 In this embodiment, the image monitoring device 1 in which one camera 2 is provided for the set of the communication unit 3 and the image processing unit 5 is exemplified, but in another embodiment, the communication unit 3 and the image processing unit 3 are illustrated. It is also possible to configure two or more cameras 2 to be connected to the set of units 5. In that case, the communication unit 3 receives the captured images from each camera 2 in time division, and the image processing unit 5 processes the captured images from each camera 2 in time division processing or parallel processing.

［学習時における画像監視装置の機能］
図２は画像監視装置１が本発明に係る学習装置として機能する際の概略の機能ブロック図である。図２には専ら、通信部３、記憶部４および画像処理部５の機能が示されており、具体的には、通信部３は撮影画像取得手段３０等として機能し、記憶部４は環境モデル記憶手段４０、カメラ情報記憶手段４１、背景情報記憶手段４２、学習用データ記憶手段４３および学習済モデル記憶手段４４等として機能し、画像処理部５は背景情報生成手段５０、学習用データ生成手段５１および学習手段５２等として機能する。 [Functions of image monitoring device during learning]
FIG. 2 is a schematic functional block diagram when the image monitoring device 1 functions as the learning device according to the present invention. FIG. 2 shows the functions of the communication unit 3, the storage unit 4, and the image processing unit 5 exclusively. Specifically, the communication unit 3 functions as a captured image acquisition means 30 and the like, and the storage unit 4 is an environment. It functions as a model storage means 40, a camera information storage means 41, a background information storage means 42, a learning data storage means 43, a trained model storage means 44, etc., and the image processing unit 5 functions as a background information generation means 50, a learning data generation means. It functions as a means 51, a learning means 52, and the like.

撮影画像取得手段３０はカメラ２から撮影画像を順次取得して、取得した撮影画像を背景情報生成手段５０および学習用データ生成手段５１に順次出力する。なお、説明の便宜上、学習に用いるときの撮影画像を学習用撮影画像、判定に用いるときの撮影画像を判定用撮影画像と、必要に応じて区別して称する。 The captured image acquisition means 30 sequentially acquires captured images from the camera 2, and sequentially outputs the acquired captured images to the background information generation means 50 and the learning data generation means 51. For convenience of explanation, the captured image used for learning is referred to as a learning captured image, and the captured image used for determination is referred to as a determination captured image, if necessary.

環境モデル記憶手段４０は、監視空間の背景を構成する複数の構成物（背景構成物）の三次元モデルを三次元背景として記憶する。 The environment model storage means 40 stores a three-dimensional model of a plurality of components (background components) constituting the background of the monitoring space as a three-dimensional background.

背景構成物は例えば、屋外であれば、歩道、道路、建物、標識などの建造物や、樹木などの移動しない自然物である。好適には、道路のうちのアスファルト部分と白線部分、また標識のうちの地色部分と文字・マーク部分のように、反射特性が互いに有意に異なる部分が別の背景構成物として記憶される。 The background structure is, for example, a building such as a sidewalk, a road, a building, or a sign, or a non-moving natural object such as a tree if it is outdoors. Preferably, the asphalt portion and the white line portion of the road, and the ground color portion and the character / mark portion of the sign, which have significantly different reflection characteristics from each other, are stored as different background components.

背景構成物の三次元モデルは、監視空間を模したＸＹＺ座標系における各背景構成物の位置、姿勢、立体形状にて表される三次元座標値および各背景構成物の反射特性のデータを含む。反射特性は一般的に、構成物表面の色、テクスチャ、反射率等の要素で構成される。反射率は例えば、鏡面反射成分の反射率および拡散反射成分の反射率、並びにそれらの割合をパラメータとして持つ二色性反射モデルで表現される。 The three-dimensional model of the background component includes data on the three-dimensional coordinate values represented by the position, attitude, and three-dimensional shape of each background component in the XYZ coordinate system imitating the monitoring space, and the reflection characteristic data of each background component. .. Reflective properties are generally composed of factors such as the color, texture, and reflectance of the surface of the composition. The reflectance is expressed by, for example, a bicolor reflection model having the reflectance of the mirror surface reflection component, the reflectance of the diffuse reflection component, and their ratios as parameters.

背景構成物の三次元モデルは、建築設計時に作成されたＩＦＣ（Industry Foundation Classes）規格の建物情報、三次元ＣＡＤデータ等あるいは事前の実計測データから取得できる。 The 3D model of the background structure can be acquired from the building information of the IFC (Industry Foundation Classes) standard created at the time of building design, 3D CAD data, etc. or the actual measurement data in advance.

また、環境モデル記憶手段４０はさらに当該監視空間の照明モデルも予め記憶している。照明モデルは、監視空間を照明する１以上の光源について、監視空間を模したＸＹＺ座標系における当該光源の位置、および当該光源の配光、色温度などで表される照明特性を含む。光源は人工照明や太陽等である。 Further, the environmental model storage means 40 also stores the lighting model of the monitoring space in advance. The illumination model includes, for one or more light sources that illuminate the surveillance space, lighting characteristics represented by the position of the light source in the XYZ coordinate system imitating the surveillance space, the light distribution of the light source, the color temperature, and the like. The light source is artificial lighting, the sun, or the like.

カメラ情報記憶手段４１は監視空間を模したＸＹＺ座標系におけるカメラ２のカメラパラメータを予め記憶している。カメラパラメータは外部パラメータと内部パラメータとからなる。外部パラメータはＸＹＺ座標系におけるカメラ２の位置姿勢である。内部パラメータはカメラ２の焦点距離、中心座標、歪係数などである。カメラパラメータは事前のキャリブレーションによって計測され、カメラ情報記憶手段４１に記憶される。このカメラパラメータをピンホールカメラモデルに適用することによってＸＹＺ座標系の座標をカメラ２の撮影面を表すｘｙ座標系に変換できる。 The camera information storage means 41 stores in advance the camera parameters of the camera 2 in the XYZ coordinate system that imitates the surveillance space. Camera parameters consist of external parameters and internal parameters. The external parameter is the position and orientation of the camera 2 in the XYZ coordinate system. The internal parameters are the focal length of the camera 2, the center coordinates, the distortion coefficient, and the like. The camera parameters are measured by pre-calibration and stored in the camera information storage means 41. By applying this camera parameter to the pinhole camera model, the coordinates of the XYZ coordinate system can be converted into the xy coordinate system representing the shooting surface of the camera 2.

背景情報記憶手段４２は、監視空間の背景が撮影された撮影画像（背景画像）にて背景構成物の反射特性が類似する局所領域の集まりである特性類似領域を記憶する。ここで、局所領域は予め設定される。本実施形態においては撮影画像を構成する各画素を局所領域として設定する。ちなみに、局所領域は、２×２画素の領域、３×３画素の領域等、複数の画素からなる領域とすることもできる。特性類似領域を参照することによって、学習用撮影画像中の任意の局所領域および判定用撮影画像中の任意の局所領域について、当該局所領域に背景として撮影され得る背景構成物の反射特性を特定できる。 The background information storage means 42 stores a characteristic-similar region, which is a collection of local regions having similar reflection characteristics of the background component in a captured image (background image) in which the background of the surveillance space is captured. Here, the local area is preset. In the present embodiment, each pixel constituting the captured image is set as a local region. Incidentally, the local region may be a region composed of a plurality of pixels, such as a region of 2 × 2 pixels and an region of 3 × 3 pixels. By referring to the characteristic-similar region, it is possible to specify the reflection characteristics of the background component that can be photographed as a background in the local region for any local region in the learning captured image and any local region in the determination captured image. ..

また、背景情報記憶手段４２は、学習用撮影画像において、影が撮影されていると推定された領域（推定影領域）を記憶する。推定影領域の情報は、学習用撮影画像の各局所領域において影が撮影されている可能性の高さを表す学習用影度合いに変換され、学習にて教師データとして利用される。 Further, the background information storage means 42 stores a region (estimated shadow region) in which a shadow is estimated to be captured in the learning image. The information in the estimated shadow region is converted into a learning shadow degree indicating the high possibility that a shadow is captured in each local region of the captured image for learning, and is used as teacher data in learning.

また、背景情報記憶手段４２は、撮影画像に撮影され得る背景構成物の像を表した背景画像を記憶する。背景画像は学習用撮影画像および判定用撮影画像と比較されて各撮影画像において背景構成物以外（前景物体）の像が撮影されている領域（前景領域）を抽出するために用いられる。 Further, the background information storage means 42 stores a background image representing an image of a background component that can be captured in the captured image. The background image is compared with the learning image and the determination image, and is used to extract a region (foreground region) in which an image other than the background component (foreground object) is captured in each captured image.

背景情報生成手段５０は特性類似領域、推定影領域および背景画像を算出し、それらを背景情報記憶手段４２に記憶させる。例えば、特性類似領域は、環境モデル記憶手段４０に記憶されている環境モデルを、カメラ情報記憶手段４１に記憶されているカメラパラメータを用いてレンダリングすることにより算出できる。 The background information generation means 50 calculates a characteristic-similar region, an estimated shadow region, and a background image, and stores them in the background information storage means 42. For example, the characteristic-similar region can be calculated by rendering the environment model stored in the environment model storage means 40 using the camera parameters stored in the camera information storage means 41.

具体的には、背景情報生成手段５０は、カメラ２のカメラパラメータを用いて環境モデルをカメラ２の撮影面にレンダリングすることによって、撮影面に形成される画像の各画素に投影される背景構成物を特定する。なお、このレンダリングにおいて光源の照明条件は問わず、任意の照明条件を１つ設定すればよい。 Specifically, the background information generation means 50 renders an environment model on the shooting surface of the camera 2 using the camera parameters of the camera 2, so that the background configuration is projected on each pixel of the image formed on the shooting surface. Identify things. In this rendering, any lighting condition may be set regardless of the lighting condition of the light source.

その一方で、背景情報生成手段５０は環境モデルに含まれる背景構成物の反射特性ごとにその識別子として反射特性ＩＤを付与する。その際、値が完全一致する反射特性に共通の反射特性ＩＤを付与してもよいし、値が同一とみなせる程度に類似する反射特性には共通の反射特性ＩＤを付与することとしてもよい。反射特性の類否は、反射特性を構成する上述した要素、パラメータに基づいて判定される。具体的には、それぞれの要素およびパラメータの差が予め定めた閾値以下であれば反射特性が類似と判断する。なお、元から反射特性ごとのＩＤが付与されている環境モデルであれば当該ＩＤを利用すればよい。 On the other hand, the background information generation means 50 assigns a reflection characteristic ID as an identifier for each reflection characteristic of the background component included in the environment model. At that time, a common reflection characteristic ID may be assigned to the reflection characteristics having completely matching values, or a common reflection characteristic ID may be assigned to the reflection characteristics having similar values to the extent that they can be regarded as the same. The similarity of the reflection characteristics is determined based on the above-mentioned elements and parameters constituting the reflection characteristics. Specifically, if the difference between each element and parameter is equal to or less than a predetermined threshold value, it is determined that the reflection characteristics are similar. If the environment model is originally given an ID for each reflection characteristic, the ID may be used.

そして、次に、背景情報生成手段５０は撮影画像の各画素に対応する画素を有した反射特性マップを作成し、当該反射特性マップの各画素の画素値に、当該画素に投影される背景構成物の反射特性ＩＤを設定する。この反射特性マップにおいて画素値が同一である画素からなる領域それぞれが特性類似領域となる。 Next, the background information generation means 50 creates a reflection characteristic map having pixels corresponding to each pixel of the captured image, and the background configuration projected on the pixel to the pixel value of each pixel of the reflection characteristic map. Set the reflection characteristic ID of the object. In this reflection characteristic map, each region consisting of pixels having the same pixel value is a characteristic-similar region.

また、推定影領域および背景画像は、撮影画像の照明条件およびカメラ情報記憶手段４１に記憶されているカメラパラメータにて、環境モデル記憶手段４０に記憶されている環境モデルをレンダリングすることにより算出できる。 Further, the estimated shadow area and the background image can be calculated by rendering the environment model stored in the environment model storage means 40 with the lighting conditions of the captured image and the camera parameters stored in the camera information storage means 41. ..

具体的には、背景情報生成手段５０は、まず、撮影画像が撮影された時点における光源の照明条件の推定および当該照明条件下での環境モデルのレンダリングを行う。すなわち、複数通りの照明条件を設定してレンダリングを行い、撮影画像とレンダリングの結果として得られるレンダリング画像との類似度を算出し、類似度が最大であるレンダリング画像を背景画像として選択する。背景情報生成手段５０は次に、背景画像において背景構成物の影が形成されている領域（直接光が背景構成物によって遮られている領域）を推定影領域とする。背景情報生成手段５０はこのように算出した背景画像、推定影領域を上述したように背景情報記憶手段４２に記憶させる。 Specifically, the background information generation means 50 first estimates the lighting conditions of the light source at the time when the captured image is captured and renders the environment model under the lighting conditions. That is, rendering is performed by setting a plurality of lighting conditions, the similarity between the captured image and the rendered image obtained as a result of rendering is calculated, and the rendered image having the maximum similarity is selected as the background image. Next, the background information generation means 50 sets a region in the background image in which the shadow of the background component is formed (the region where the direct light is blocked by the background component) as the estimated shadow region. The background information generation means 50 stores the background image and the estimated shadow region calculated in this way in the background information storage means 42 as described above.

なお、背景情報生成手段５０は、前景物体が監視空間内に存在しない状態での撮影画像を背景画像とすることもできる。また、背景情報生成手段５０は、背景画像の輝度値が予め定めたしきい値未満である領域を推定影領域とすることもできる。 The background information generation means 50 can also use a captured image in a state where the foreground object does not exist in the surveillance space as the background image. Further, the background information generation means 50 may use a region where the luminance value of the background image is less than a predetermined threshold value as an estimated shadow region.

図３は反射特性マップの例を示す模式図である。図３において、反射特性マップ１００は、車道の右側に歩道を挟んで建物が存在する曲がり角が写った撮影画像に対応する例である。図に示すように、反射特性マップ１００は撮影画像の各画素と対応する画素を有する画像データとすることができ、カメラ２の撮影面と同じｘｙ座標系で表すことができる。 FIG. 3 is a schematic diagram showing an example of a reflection characteristic map. In FIG. 3, the reflection characteristic map 100 is an example corresponding to a photographed image in which a corner where a building exists across a sidewalk on the right side of the roadway is shown. As shown in the figure, the reflection characteristic map 100 can be image data having pixels corresponding to each pixel of the photographed image, and can be represented by the same xy coordinate system as the imaged surface of the camera 2.

具体的には反射特性マップ１００は、それに対応する撮影画像に反射特性が異なる背景構成物として、石畳からなる歩道、アスファルト舗装された道路、当該道路に道路標示として描かれた白線、および建物の壁が写っている場合の例である。ここで例えば、歩道の石畳の反射特性に対しては反射特性ＩＤを「１」と定義し、同様に、アスファルトの路面、白色の道路標示、建物の壁の反射特性に対しては反射特性ＩＤをそれぞれ「２」，「３」，「４」と定義する。 Specifically, the reflection characteristic map 100 shows a sidewalk made of cobblestones, an asphalt-paved road, a white line drawn as a road marking on the road, and a building as background components having different reflection characteristics from the corresponding captured image. This is an example when the wall is reflected. Here, for example, the reflection characteristic ID is defined as "1" for the reflection characteristic of the stone pavement of the sidewalk, and similarly, the reflection characteristic ID is defined for the reflection characteristic of the asphalt road surface, the white road marking, and the wall of the building. Are defined as "2", "3", and "4", respectively.

反射特性マップ１００には撮影画像における反射特性が異なる背景構成物の領域ごとに反射特性ＩＤが設定される。画像１０１は反射特性マップ１００のうち歩道領域１１１を斜線で示しており、当該斜線部の画素に反射特性ＩＤとして値「１」が設定される。同様に、画像１０２，１０３，１０４はそれぞれ反射特性マップ１００のうちアスファルト領域１１２、白線領域１１３、壁領域１１４を斜線で示しており、当該斜線部の画素に反射特性ＩＤとしてそれぞれ値「２」，「３」，「４」が設定される。 In the reflection characteristic map 100, the reflection characteristic ID is set for each region of the background component having different reflection characteristics in the captured image. In the image 101, the sidewalk region 111 in the reflection characteristic map 100 is shown by diagonal lines, and the value “1” is set as the reflection characteristic ID in the pixels of the shaded portion. Similarly, in the images 102, 103, and 104, the asphalt region 112, the white line region 113, and the wall region 114 of the reflection characteristic map 100 are shown by diagonal lines, and the pixel of the shaded portion has a value of “2” as the reflection characteristic ID, respectively. , "3", "4" are set.

学習用データ記憶手段４３は影判定モデルの学習に用いるデータ（学習用データ）を記憶する。学習用データは、１または複数の学習用撮影画像についての、局所領域ごとの、反射特性の情報、画像特徴および学習用影度合いである。影判定モデルは、例えば、ランダムフォレスト（Random Forest）と呼ばれる木構造のモデルとすることができる。 The learning data storage means 43 stores data (learning data) used for learning the shadow determination model. The learning data is information on reflection characteristics, image features, and a degree of shadow for learning for each local region for one or a plurality of captured images for learning. The shadow determination model can be, for example, a model of a tree structure called a random forest.

局所領域の反射特性の情報は、例えば、当該局所領域が帰属する特性類似領域に付与されている反射特性ＩＤで表すことができる。また、反射特性の情報は、反射特性を記述するパラメータの一部または全部であってもよい。 The information on the reflection characteristic of the local region can be represented by, for example, the reflection characteristic ID assigned to the characteristic-like region to which the local region belongs. Further, the information on the reflection characteristics may be a part or all of the parameters describing the reflection characteristics.

局所領域の画像特徴は、例えば、当該局所領域における撮影画像の特徴量、当該局所領域の近傍領域における撮影画像の特徴量、当該局所領域の位置であり、これらの値を要素とするベクトルで表現することができる。 The image features of the local region are, for example, the feature amount of the captured image in the local region, the feature amount of the captured image in the vicinity region of the local region, and the position of the local region, and are represented by a vector having these values as elements. can do.

局所領域における撮影画像の特徴量は、例えば、局所領域の画素値とすることができる。撮影画像がカラー画像であればＲＧＢ値、モノクロ画像であれば輝度値である。撮影画像がカラー画像の場合、ＲＧＢ値を別の表色系に変換した値、ＲＧＢ値をグレースケール変換した輝度値とすることもでき、或いは、ＲＧＢ値、別の表色系に変換した値、輝度値のうちの２以上とすることもできる。局所領域の画像特徴には、少なくとも局所領域における撮影画像の特徴量を含ませることが好適である。 The feature amount of the captured image in the local region can be, for example, a pixel value in the local region. If the captured image is a color image, it is an RGB value, and if it is a monochrome image, it is a luminance value. When the captured image is a color image, it can be a value obtained by converting the RGB value to another color system, a luminance value obtained by converting the RGB value to a gray scale, or a RGB value or a value converted to another color system. , It is also possible to set it to 2 or more of the brightness values. It is preferable that the image features in the local region include at least the feature amount of the captured image in the local region.

局所領域の近傍領域における撮影画像の特徴量は、例えば、近傍領域における撮影画像の平均輝度値、最小輝度値など近傍領域における撮影画像の画素値の代表値とすることができる。撮影画像がカラー画像の場合、平均ＲＧＢ値、ＲＧＢ値を別の表色系に変換した値の平均値などとすることもできる。或いは、近傍領域における撮影画像の画素値そのもの、上述した値のうちの２以上とすることもできる。近傍領域における撮影画像の特徴量を画像特徴に加えることで、影の空間的な連続性を含めた学習・判定が可能となる。 The feature amount of the captured image in the vicinity region of the local region can be, for example, a representative value of the pixel value of the captured image in the neighborhood region such as the average brightness value and the minimum brightness value of the captured image in the neighborhood region. When the captured image is a color image, the average RGB value, the average value of the RGB values converted into another color system, and the like can be used. Alternatively, the pixel value itself of the captured image in the vicinity region may be two or more of the above-mentioned values. By adding the feature amount of the captured image in the vicinity region to the image feature, learning / judgment including the spatial continuity of the shadow becomes possible.

局所領域の位置は、撮影画像における局所領域のｘｙ座標とすることができる。局所領域が２以上の画素からなる場合は、所定の代表画素（例えば左上の画素）のｘｙ座標とすればよい。局所領域の位置を画像特徴に加えることで、アスファルト上の黒っぽいしみのように小さな経年変化を学習・判定することが可能となる。 The position of the local region can be the xy coordinate of the local region in the captured image. When the local region consists of two or more pixels, it may be the xy coordinate of a predetermined representative pixel (for example, the upper left pixel). By adding the position of the local region to the image features, it is possible to learn and judge small secular changes such as dark spots on asphalt.

局所領域の学習用影度合いは、学習用撮影画像において当該局所領域に影が撮影されている可能性の高さを表す値である。ここで、影判定モデルをランダムフォレストでモデル化した本実施形態においては、例えば、影判定モデルの出力する尤度を値域が［０，１］の連続値とすることができる。当該尤度は１に近いほど影領域である可能性が高く、０に近いほど非影領域である可能性が高いことを意味している。これに対応して推定影領域内の画素の学習用影度合いを１．０、推定影領域外の画素の学習用影度合いを０．０と設定することができる。 The degree of shadow for learning in a local region is a value indicating the high possibility that a shadow is captured in the local region in the photographed image for learning. Here, in the present embodiment in which the shadow determination model is modeled in a random forest, for example, the likelihood output by the shadow determination model can be a continuous value having a range of [0,1]. The closer the likelihood is to 1, the more likely it is to be a shadow region, and the closer it is to 0, the more likely it is to be a non-shadow region. Correspondingly, the learning shadow degree of the pixels in the estimated shadow area can be set to 1.0, and the learning shadow degree of the pixels outside the estimated shadow area can be set to 0.0.

学習用データ生成手段５１は学習用撮影画像、並びに背景情報記憶手段４２に記憶されている反射特性の情報、推定影領域および背景画像から学習用データを生成し、生成した学習用データを学習用データ記憶手段４３に記憶させる。 The learning data generation means 51 generates learning data from the learning photographed image, the reflection characteristic information stored in the background information storage means 42, the estimated shadow region, and the background image, and the generated learning data is used for learning. It is stored in the data storage means 43.

学習用データ生成手段５１は、学習用撮影画像が入力されるたびに、当該学習用撮影画像についての、局所領域ごとの学習用データを生成して、学習用データ記憶手段４３に蓄積させる。具体的には、局所領域ごとに、学習用撮影画像から当該局所領域の画像特徴を抽出するとともに、当該局所領域が推定影領域に帰属していれば学習用影度合いを１．０、帰属していなければ０．０に設定し、反射特性ＩＤと画像特徴と学習用影度合いとを対応付けて学習用データ記憶手段４３に追記する。 Each time the learning photographed image is input, the learning data generation means 51 generates learning data for each local region of the learning photographed image and stores it in the learning data storage means 43. Specifically, for each local region, the image features of the local region are extracted from the captured image for learning, and if the local region belongs to the estimated shadow region, the degree of shadow for learning is assigned to 1.0. If not, it is set to 0.0, and the reflection characteristic ID, the image feature, and the degree of shadow for learning are associated with each other and added to the learning data storage means 43.

このとき、学習用撮影画像において前景物体が撮影されている局所領域は除いて学習用データを生成するのが好適である。そのために、学習用データ生成手段５１は、学習用撮影画像と背景画像との背景差分処理または背景相関処理を行って、予め定めたしきい値を超えて背景画像と相違する変化領域を学習用撮影画像から除き、変化領域を除外した学習用撮影画像から学習用データを生成する。 At this time, it is preferable to generate the learning data by excluding the local region where the foreground object is photographed in the learning image. Therefore, the learning data generation means 51 performs background subtraction processing or background correlation processing between the learning photographed image and the background image, and performs learning on a change region that exceeds a predetermined threshold and is different from the background image. Learning data is generated from the learning captured image excluding the change area from the captured image.

学習済モデル記憶手段４４は、撮影画像に設定された局所領域の画像特徴を入力されて当該局所領域に影が撮影されている可能性の高さを表す影度合いを出力する影判定モデルを記憶する。学習済モデル記憶手段４４が記憶する影判定モデルは、学習用撮影画像における局所領域の画像特徴および当該局所領域について推定された影度合いである学習用影度合いを用いた学習が行われた学習済モデルである。本実施形態においては、特性類似領域ごとの学習済モデルのそれぞれが当該特性類似領域の反射特性ＩＤと対応付けて記憶される。 The trained model storage means 44 stores a shadow determination model that inputs an image feature of a local region set in the captured image and outputs a shadow degree indicating the high possibility that a shadow is captured in the local region. do. The shadow determination model stored in the trained model storage means 44 has been trained using the image features of the local region in the captured image for training and the learning shadow degree which is the estimated shadow degree for the local region. It is a model. In the present embodiment, each of the trained models for each characteristic-similar region is stored in association with the reflection characteristic ID of the characteristic-similar region.

ちなみに、学習は反復的に行われ、それに伴って影判定モデルは更新される。特に画像監視装置１は後述するように、監視動作においても撮影画像を取得するごとに影判定モデル学習処理を行って影判定モデルを更新する。つまり、学習装置にて扱う判定モデルは、学習済モデルとしての側面と、さらに学習される学習途中のモデルとしての側面とを有する。これを踏まえて学習装置での判定モデルを簡潔に学習モデルと表現すると、学習済モデル記憶手段４４は本願発明の学習装置における学習モデル記憶手段に相当する。 By the way, the learning is performed iteratively, and the shadow judgment model is updated accordingly. In particular, as will be described later, the image monitoring device 1 performs shadow determination model learning processing every time a captured image is acquired and updates the shadow determination model even in the monitoring operation. That is, the determination model handled by the learning device has an aspect as a trained model and an aspect as a model in the middle of learning to be further learned. Based on this, if the determination model in the learning device is simply expressed as a learning model, the learned model storage means 44 corresponds to the learning model storage means in the learning device of the present invention.

学習手段５２は学習用データ記憶手段４３に記憶された学習用データを用いて、学習済モデル記憶手段４４に記憶された影判定モデルを機械学習させ、それにより生成される学習済モデルを学習済モデル記憶手段４４に記憶させる。つまり、学習手段５２は、学習用撮影画像における局所領域の画像特徴を影判定モデルに入力して得られる影度合いを、当該局所領域の学習用影度合いに近づける更新を影判定モデルに対して行うことで、学習済モデルを生成する。 The learning means 52 uses the learning data stored in the learning data storage means 43 to machine-learn the shadow determination model stored in the trained model storage means 44, and trains the trained model generated thereby. It is stored in the model storage means 44. That is, the learning means 52 updates the shadow determination model so that the degree of shadow obtained by inputting the image feature of the local region in the captured image for learning into the shadow determination model is close to the degree of shadow for learning in the local region. By doing so, a trained model is generated.

本実施形態においては、学習手段５２は特性類似領域ごとに学習済モデルを生成する。つまり、特性類似領域ごとに、学習用撮影画像における当該特性類似領域に帰属する局所領域の画像特徴を影判定モデルに入力して得られる影度合いを、当該局所領域の学習用影度合いに近づける更新を影判定モデルに対して行うことで、学習済モデルを生成する。 In the present embodiment, the learning means 52 generates a trained model for each characteristic-similar region. That is, for each characteristic-similar region, the degree of shadow obtained by inputting the image feature of the local region belonging to the characteristic-similar region in the image for learning into the shadow determination model is updated to approach the learning shadow degree of the local region. Is performed on the shadow judgment model to generate a trained model.

影判定モデルは、ランダムフォレストに代えて、サポートベクターマシーン（Support Vector Machine：ＳＶＭ）、アダブースト（AdaBoost）型の識別器、または識別型のＣＮＮ（Convolutional Neural Network）等、２クラス問題に適用可能な種々の公知のモデルとすることができる。 The shadow determination model can be applied to two-class problems such as Support Vector Machine (SVM), AdaBoost type discriminator, or discriminant type CNN (Convolutional Neural Network) instead of random forest. Various known models can be used.

図４および図５を用いて学習の例を説明する。図４は監視空間の画像の模式図であり、学習用撮影画像２００およびレンダリング画像２１０それぞれの一例を示している。学習用撮影画像２００において、背景構成物は図３に示した反射特性マップと共通であり、特性類似領域として反射特性ＩＤの値「１」～「４」に対応する４つの領域、具体的には、歩道、アスファルト面、白色道路標示、建物壁が存在する。また、学習用撮影画像２００には、背景構成物の他に歩道領域に撮影された人物２０１も撮影されている。また、斜線部が影領域２０２を表している。 An example of learning will be described with reference to FIGS. 4 and 5. FIG. 4 is a schematic diagram of an image of the monitoring space, and shows an example of each of the learning photographed image 200 and the rendered image 210. In the captured image 200 for learning, the background composition is common to the reflection characteristic map shown in FIG. 3, and four regions corresponding to the values “1” to “4” of the reflection characteristic ID as characteristic-similar regions, specifically, There are sidewalks, asphalt surfaces, white road markings, and building walls. Further, in the learning image 200, a person 201 photographed in the sidewalk area is also photographed in addition to the background structure. Further, the shaded area represents the shadow area 202.

レンダリング画像２１０は、上述した背景情報生成手段５０により環境モデル等からレンダリングで算出される。レンダリング画像２１０は学習用撮影画像２００に対応して生成され、背景画像および推定影領域２１２（斜線部）が示されている。なお、レンダリング画像２１０には、学習用撮影画像２００における影領域２０２と日向領域との境界を一点鎖線で示した。これは、推定影領域２１２が真の影領域２０２に対して推定誤差があることを例示している。推定影領域の推定に関しては、影領域および非影領域として正しく推定された画素数が誤推定された画素数を十分に上回っていればよく、誤差が許容される。 The rendered image 210 is calculated by rendering from an environment model or the like by the background information generating means 50 described above. The rendered image 210 is generated corresponding to the captured image 200 for learning, and the background image and the estimated shadow region 212 (hatched portion) are shown. In the rendered image 210, the boundary between the shadow region 202 and the Hinata region in the learning photographed image 200 is shown by a alternate long and short dash line. This illustrates that the estimated shadow region 212 has an estimation error with respect to the true shadow region 202. Regarding the estimation of the estimated shadow region, it is sufficient that the number of pixels correctly estimated as the shadow region and the non-shadow region sufficiently exceeds the number of pixels estimated incorrectly, and an error is allowed.

図５は学習用データを説明する模式図である。画像３１０～３４５の斜線部は学習用撮影画像２００において学習用データに用いられる領域である。具体的には、反射特性ＩＤの値をｒとすると、画像３１５の斜線部は学習用撮影画像２００のうちレンダリング画像２１０の推定影領域２１２（斜線部）のｒ＝１の部分に対応し、画像３１０の斜線部は学習用撮影画像２００のうち推定影領域２１２以外、つまり非影と推定される領域（推定非影領域）のｒ＝１の部分に対応する。また、画像３２０，３２５それぞれの斜線部は学習用撮影画像２００のうちｒ＝２の推定非影領域、推定影領域２１２に対応する部分であり、同様に、画像３３０，３３５はｒ＝３、画像３４０，３４５はｒ＝４での推定非影領域、推定影領域２１２に対応する部分を示している。 FIG. 5 is a schematic diagram illustrating learning data. The shaded areas of the images 310 to 345 are areas used for learning data in the learning photographed image 200. Specifically, assuming that the value of the reflection characteristic ID is r, the shaded portion of the image 315 corresponds to the portion of the estimated shadow region 212 (shaded portion) of the rendered image 210 in the learning captured image 200, where r = 1. The shaded portion of the image 310 corresponds to the portion of the learning captured image 200 other than the estimated shadow region 212, that is, the portion where r = 1 in the region estimated to be non-shadow (estimated non-shadow region). Further, the shaded areas of the images 320 and 325 correspond to the estimated non-shadow area and the estimated shadow area 212 of r = 2 in the image 200 for learning, and similarly, the images 330 and 335 have r = 3. Images 340 and 345 show the portion corresponding to the estimated non-shadow region and the estimated shadow region 212 at r = 4.

図５の右側に表形式で示すデータは学習用データ記憶手段４３に記憶される学習用データを模式的に表しており、表の各行は１枚の学習用撮影画像の１つの局所領域についてのデータであり、当該データは反射特性ＩＤ、画像特徴、学習用影度合いを含む。画像特徴は記号Ｆ_＊，ｒ（ｍ）で表しており、当該表記にてＦの添字の＊は局所領域が推定影領域に位置するか推定非影領域に位置するかを区別するラベルであり、＊の文字がＳなら影、Ｎなら非影を意味する。また、Ｆの添字のｒは反射特性ＩＤであり、ｍは、影／非影ラベル（＊）と反射特性ＩＤ（ｒ）との組み合わせごとに複数設定され得る局所領域を区別する番号である。 The data shown in tabular form on the right side of FIG. 5 schematically represent the learning data stored in the learning data storage means 43, and each row of the table is for one local region of one learning photographed image. It is data, and the data includes a reflection characteristic ID, an image feature, and a degree of shadow for learning. The image feature is represented by the symbols F _{* and r} (m), and in the notation, the subscript * of F is a label that distinguishes whether the local region is located in the estimated shadow region or the estimated non-shadow region. If the character of * is S, it means shadow, and if it is N, it means non-shadow. Further, the subscript r of F is a reflection characteristic ID, and m is a number that distinguishes a plurality of local regions that can be set for each combination of the shadow / non-shadow label (*) and the reflection characteristic ID (r).

例えば、歩道において非影領域と推定された領域から前景領域を除いた領域（画像３１０の斜線部）に帰属する局所領域のうちの１番目の局所領域について、学習用撮影画像２００から抽出された画像特徴Ｆ_Ｎ，１（１）が、歩道の反射特性ＩＤである１および学習用影度合い０．０と対応付けられて記憶されている。ここで、学習用影度合いは推定非影領域、つまりラベルがＮの局所領域に対しては０．０が設定され、推定影領域、つまりラベルがＳの局所領域に対しては１．０が設定される。ちなみに、レンダリング画像２１０にて一点鎖線と推定影領域２１２とに挟まれる影領域の推定誤差に位置する局所領域については、推定非影領域となるためラベルがＮであり、学習用影度合いは０．０となるが、画像特徴Ｆ_Ｎ，ｒ（ｍ）は学習用撮影画像２００から抽出されるので、影に対応する画素値等となる。 For example, the first local region among the local regions belonging to the region (shaded portion of the image 310) excluding the foreground region from the region estimated to be the non-shadow region on the sidewalk was extracted from the learning photographed image 200. The image feature F _{N, 1} (1) is stored in association with 1 which is the reflection characteristic ID of the sidewalk and the shadow degree 0.0 for learning. Here, the degree of shadow for learning is set to 0.0 for the estimated non-shadow region, that is, the local region with the label N, and 1.0 for the estimated shadow region, that is, the local region with the label S. Set. By the way, for the local region located in the estimation error of the shadow region sandwiched between the one-point chain line and the estimated shadow region 212 in the rendered image 210, the label is N because it is an estimated non-shadow region, and the degree of shadow for learning is 0. However, since the image features F _{N and r} (m) are extracted from the captured image 200 for learning, they are pixel values and the like corresponding to shadows.

上述したように学習手段５２は特性類似領域ごとに学習済モデルを生成する。よって、図５の例では、例えば、歩道については、反射特性ＩＤの値「１」と対応付けられた学習データ、具体的には画像特徴Ｆ_Ｎ，１（１），Ｆ_Ｎ，１（２），…およびＦ_Ｓ，１（１），Ｆ_Ｓ，１（２），…と、それらに対応付けられた学習用影度合いとを用いて影判定モデルが学習される。また、アスファルト面、白色道路標示、建物壁についての影判定モデルも同様に、それぞれ対応する反射特性ＩＤの学習データを用いて学習される。 As described above, the learning means 52 generates a trained model for each characteristic-similar region. Therefore, in the example of FIG. 5, for example, for the sidewalk, the learning data associated with the value “1” of the reflection characteristic ID, specifically, the image features F _{N, 1} (1), F _{N, 1} (2). ), ... And _{FS, 1} (1), FS _{, 1} (2), ..., And the learning shadow degree associated with them is used to train the shadow determination model. Similarly, the shadow determination model for the asphalt surface, the white road sign, and the building wall is also learned using the learning data of the corresponding reflection characteristic IDs.

［判定時における画像監視装置の機能］
図６は画像監視装置１が本発明に係る影検出装置として機能し、撮影画像にて影判定を行い、その結果を利用して監視空間における監視対象の有無等を解析する際の概略の機能ブロック図である。図６には専ら、通信部３、記憶部４および画像処理部５の機能が示されており、具体的には、通信部３は撮影画像取得手段３０、解析結果出力手段３１等として機能し、記憶部４は環境モデル記憶手段４０、カメラ情報記憶手段４１、背景情報記憶手段４２および学習済モデル記憶手段４４等として機能し、画像処理部５は背景情報生成手段５０、影判定手段５３、前景抽出手段５４および前景情報解析手段５５等として機能する。 [Function of image monitoring device at the time of judgment]
FIG. 6 shows a schematic function when the image monitoring device 1 functions as a shadow detection device according to the present invention, performs shadow determination on a captured image, and analyzes the presence or absence of a monitoring target in the monitoring space using the result. It is a block diagram. FIG. 6 exclusively shows the functions of the communication unit 3, the storage unit 4, and the image processing unit 5. Specifically, the communication unit 3 functions as a captured image acquisition unit 30, an analysis result output unit 31, and the like. The storage unit 4 functions as an environment model storage means 40, a camera information storage means 41, a background information storage means 42, a trained model storage means 44, and the like, and the image processing unit 5 has a background information generation means 50, a shadow determination means 53, and the like. It functions as a foreground extraction means 54, a foreground information analysis means 55, and the like.

影検出装置における撮影画像取得手段３０、環境モデル記憶手段４０、カメラ情報記憶手段４１、背景情報記憶手段４２、学習済モデル記憶手段４４および背景情報生成手段５０の機能は、学習装置について上述した内容と同様であるのでここでは説明を省略する。 The functions of the captured image acquisition means 30, the environment model storage means 40, the camera information storage means 41, the background information storage means 42, the trained model storage means 44, and the background information generation means 50 in the shadow detection device are the contents described above for the learning device. Since it is the same as the above, the description thereof is omitted here.

影判定手段５３は、判定用撮影画像の局所領域における画像特徴を、学習済モデル記憶手段４４に記憶されている学習済モデルに入力して得られる影度合い（判定用影度合い）を予め定めた基準と比較し、判定用影度合いが基準を超える局所領域を影領域と判定して判定結果を前景抽出手段５４に出力する。 The shadow determination means 53 predetermines a shadow degree (determination shadow degree) obtained by inputting an image feature in a local region of a determination photographed image into a trained model stored in the trained model storage means 44. A local region in which the degree of shadow for determination exceeds the reference is determined as a shadow region as compared with the reference, and the determination result is output to the foreground extraction means 54.

本実施形態では、背景情報記憶手段４２が記憶している特性類似領域を利用し、影判定手段５３は、局所領域の画像特徴を当該局所領域に対応する特性類似領域の学習済モデルに入力して影度合いを得る。 In the present embodiment, the characteristic-similar region stored in the background information storage means 42 is used, and the shadow determination means 53 inputs the image feature of the local region into the trained model of the characteristic-similar region corresponding to the local region. And get the degree of shadow.

そのために、影判定手段５３は、まず、判定用撮影画像の各局所領域から画像特徴を抽出する。抽出する画像特徴は学習用データの画像特徴と同一形式の特徴ベクトルである。また、影判定手段５３は、背景情報記憶手段４２が記憶している反射特性マップを参照して各局所領域が帰属する特性類似領域の反射特性ＩＤを特定する。影判定手段５３は、学習済モデル記憶手段４４に記憶されている特性類似領域ごとの学習済モデルのうち当該反射特性ＩＤに対応するものを選択し、各局所領域の画像特徴を当該学習済モデルに入力して判定用影度合いを得る。 Therefore, the shadow determination means 53 first extracts an image feature from each local region of the image for determination. The image features to be extracted are feature vectors of the same format as the image features of the training data. Further, the shadow determination means 53 refers to the reflection characteristic map stored in the background information storage means 42 to specify the reflection characteristic ID of the characteristic-similar region to which each local region belongs. The shadow determination means 53 selects a trained model for each characteristic-similar region stored in the trained model storage means 44 that corresponds to the reflection characteristic ID, and sets the image features of each local region as the trained model. To obtain the degree of shadow for judgment.

ここで、判定の基準を１つのしきい値とし、判定用影度合いが当該しきい値以上である局所領域は影領域であり、一方、判定用影度合いが当該しきい値未満である局所領域は非影領域であると判定することもできるが、本実施形態では前景領域における誤判定を抑制するために判定の基準を２つのしきい値Ｔ_Ｓ，Ｔ_Ｎ（Ｔ_Ｓ＞Ｔ_Ｎ）で構成する。すなわち、判定用影度合いがＴ_Ｓ以上である局所領域は影領域であると判定して影ラベルを付与し、判定用影度合いがＴ_Ｎ未満である局所領域は非影領域であると判定して非影ラベルを付与し、判定用影度合いがＴ_Ｎ以上Ｔ_Ｓ未満である局所領域についてはラベル補間処理を行う。 Here, the criterion for judgment is one threshold value, and the local region in which the judgment shadow degree is equal to or higher than the threshold value is the shadow region, while the local region in which the judgment shadow degree is less than the threshold value. Can be determined to be a non-shadow region, but in the present embodiment, in order to suppress erroneous determination in the foreground region, the determination criteria are set to two threshold values _TS and _TN ( _TS > _TN ). Configure. That is, it is determined that the local region having the judgment shadow degree of _TS or more is a shadow region and a shadow label is attached, and the local region having the judgment shadow degree of less than _TN is determined to be a non-shadow region. A non-shadow label is attached, and label interpolation processing is performed for a local region where the degree of shadow for determination is _TN or more and less than _TS .

ラベル補間処理は、例えば、各局所領域と対応する画素の画素値として当該局所領域に付与されたラベルを設定したラベル画像を生成して、ラベル画像にて影ラベルが設定された領域と非影ラベルが設定された領域をラベルが不定の領域に向けて膨張させる膨張処理とすることができる。 In the label interpolation process, for example, a label image in which a label attached to the local area is set as a pixel value of a pixel corresponding to each local area is generated, and a shadow label is set in the label image and a non-shadow image. It can be an expansion process that expands the area where the label is set toward the area where the label is indefinite.

或いは、ラベル補間処理は、上記ラベル画像においてラベルが不定の領域を穴に見立てて、ラベル画像を穴埋めフィルタで繰り返しフィルタリングする穴埋め処理とすることができる。 Alternatively, the label interpolation process can be a fill-in-the-blank process in which a region in which the label is indefinite in the label image is regarded as a hole and the label image is repeatedly filtered by the fill-in-the-blank filter.

或いは、ラベル補間処理は、背景情報生成手段５０が推定した推定影領域の情報で置換する処理とすることができる。すなわち、ラベルが不定の局所領域が推定影領域であれば当該局所領域に影ラベルを付与し、ラベルが不定の局所領域が推定影領域でなければ当該局所領域に非影ラベルを付与する。 Alternatively, the label interpolation process can be a process of replacing with the information of the estimated shadow region estimated by the background information generation means 50. That is, if the local area with an indefinite label is an estimated shadow area, a shadow label is given to the local area, and if the local area with an indefinite label is not an estimated shadow area, a non-shadow label is given to the local area.

前景抽出手段５４は撮影画像取得手段３０から撮影画像を入力されるとともに影判定手段５３から当該撮影画像に設定した各局所領域についての影領域か否かの判定結果を入力され、撮影画像から前景物体が写る領域を検出し、検出結果である前景情報を前景情報解析手段５５に出力する。 The foreground extraction means 54 inputs a captured image from the captured image acquisition means 30, and also inputs a determination result of whether or not it is a shadow region for each local region set in the captured image from the shadow determination means 53, and the foreground is input from the captured image. The area where the object appears is detected, and the foreground information which is the detection result is output to the foreground information analysis means 55.

例えば、前景抽出手段５４は、入力された撮影画像のうち、ラベル画像にて非影ラベルを付与された領域を日向などが撮影された非影部分画像とし、ラベル画像にて影ラベルを付与された領域を日陰などが撮影された影部分画像とし、これら両部分画像それぞれにて前景領域を検出し、それらを合成して撮影画像における前景情報を求める。 For example, the foreground extraction means 54 sets a region of the input captured image to which a non-shadow label is attached as a non-shadow portion image in which the sun or the like is photographed, and assigns a shadow label to the label image. The shaded area is taken as a shadow partial image, and the foreground region is detected in each of these partial images, and the foreground information in the captured image is obtained by synthesizing them.

例えば、非影部分画像に対しては、前景抽出手段５４は背景画像との差分処理を行う。そして、各画素について画素値の相違度を求め、当該相違度が所定のしきい値より大きい領域を前景領域として検出する。 For example, for a non-shadow partial image, the foreground extraction means 54 performs a difference process with the background image. Then, the degree of difference in the pixel values is obtained for each pixel, and a region in which the degree of difference is larger than a predetermined threshold value is detected as a foreground region.

一方、影部分画像に対しては、前景抽出手段５４は背景画像との差分処理を行って各画素について画素値の相違度を求め、当該相違度が所定の第１のしきい値より大きい領域を強変化領域とし、第１のしきい値以下の領域を非強変化領域とする。そして、前景抽出手段５４は、影部分画像について、まず強変化領域を前景領域として抽出する。 On the other hand, for the shadow partial image, the foreground extraction means 54 performs difference processing with the background image to obtain the degree of difference in pixel values for each pixel, and the degree of difference is larger than a predetermined first threshold value. Is a strong change region, and the region below the first threshold value is a non-strong change region. Then, the foreground extraction means 54 first extracts the strong change region as the foreground region for the shadow portion image.

また、前景抽出手段５４は、特性類似領域ごとに、影部分画像の非強変化領域における輝度ヒストグラムを生成する。そして、ヒストグラムに複数の山部がある場合に、当該特性類似領域内の非強変化領域に前景物体が存在するとして、複数の山部のうち背景により生じたもの以外の山部を構成する画素群からなる領域を前景領域として抽出する。ここで、例えば、複数の山部のうち最も高いものを背景により生じたものと推定することができる。また、環境モデルを用いたレンダリング結果から背景による山部を推定することもできる。 Further, the foreground extraction means 54 generates a luminance histogram in the non-strong change region of the shadow partial image for each characteristic-similar region. Then, when there are a plurality of peaks in the histogram, assuming that a foreground object exists in a non-strong change region in the characteristic-like region, pixels constituting the peaks other than those generated by the background among the plurality of peaks. The area consisting of the group is extracted as the foreground area. Here, for example, it can be estimated that the highest of the plurality of mountainous areas is caused by the background. It is also possible to estimate the mountainous area due to the background from the rendering result using the environment model.

前景抽出手段５４は、非影部分画像から抽出した前景領域、影部分画像にて強変化領域および非強変化領域に分けて抽出した前景領域を合成して、撮影画像にて前景物体が写る領域を求める。そして、前景物体の有無、前景物体が存在する位置または領域、前景物体の画像（前景画像）等のうちの１以上を含めた前景情報を生成して前景情報解析手段５５に出力する。 The foreground extraction means 54 synthesizes the foreground region extracted from the non-shadow partial image, the foreground region extracted separately into the strong change region and the non-strong change region in the shadow partial image, and the region in which the foreground object is captured in the captured image. Ask for. Then, foreground information including one or more of the presence / absence of the foreground object, the position or region where the foreground object exists, the image of the foreground object (foreground image), and the like is generated and output to the foreground information analysis means 55.

前景情報解析手段５５は、前景抽出手段５４が出力した前景物体の画像、位置、動きを解析して、解析結果を解析結果出力手段３１へ出力する。前景情報解析手段５５は例えば、前景物体からの監視対象の検知、監視対象の姿勢の推定、監視対象の追跡などを行う。 The foreground information analysis means 55 analyzes the image, position, and movement of the foreground object output by the foreground extraction means 54, and outputs the analysis result to the analysis result output means 31. The foreground information analysis means 55, for example, detects a monitored object from a foreground object, estimates the posture of the monitored object, tracks the monitored object, and the like.

解析結果出力手段３１は前景情報解析手段５５から入力された解析結果を報知部６へ出力する。 The analysis result output means 31 outputs the analysis result input from the foreground information analysis means 55 to the notification unit 6.

［画像監視装置の動作］
図７は画像監視装置１の動作を説明する概略のフロー図である。 [Operation of image monitoring device]
FIG. 7 is a schematic flow diagram illustrating the operation of the image monitoring device 1.

画像処理部５は、まず背景情報生成手段５０として機能し、特性類似領域を算出する（ステップＳ１）。例えば、図３の反射特性マップ１００の例では、画像１０１～１０４それぞれの斜線領域が特性類似領域として得られる。背景情報生成手段５０は算出した特性類似領域を背景情報記憶手段４２に記憶させる。 The image processing unit 5 first functions as the background information generation means 50, and calculates a characteristic-similar region (step S1). For example, in the example of the reflection characteristic map 100 of FIG. 3, the shaded areas of the images 101 to 104 are obtained as characteristic-similar regions. The background information generation means 50 stores the calculated characteristic-similar region in the background information storage means 42.

背景情報記憶手段４２に特性類似領域が記憶された状態にて、通信部３は撮影画像取得手段３０として動作し、カメラ２から撮影画像を順次取得する（ステップＳ２）。 The communication unit 3 operates as the captured image acquisition means 30 in a state where the characteristic-similar region is stored in the background information storage means 42, and sequentially acquires the captured images from the camera 2 (step S2).

画像処理部５は、撮影画像取得手段３０から撮影画像を取得するごとに、背景情報生成手段５０として動作し、推定影領域を算出する（ステップＳ３）。具体的には、上述したように、環境モデル記憶手段４０、カメラ情報記憶手段４１に記憶された環境モデルおよびカメラパラメータを用いたレンダリングにより影領域を推定する。また、背景情報生成手段５０は撮影画像を取得するごとに、レンダリングにより背景画像も算出する。撮影画像を取得するたびに、太陽などの光源の変化を考慮してレンダリングを行うことで、当該撮影画像に対応した推定影領域、背景画像が得られる。背景情報生成手段５０は算出した推定影領域および背景画像を背景情報記憶手段４２に記憶させる。 The image processing unit 5 operates as the background information generation means 50 every time the captured image is acquired from the captured image acquisition means 30, and calculates an estimated shadow region (step S3). Specifically, as described above, the shadow region is estimated by rendering using the environment model storage means 40, the environment model stored in the camera information storage means 41, and the camera parameters. Further, the background information generation means 50 calculates the background image by rendering every time the captured image is acquired. Each time a captured image is acquired, rendering is performed in consideration of changes in a light source such as the sun, so that an estimated shadow area and a background image corresponding to the captured image can be obtained. The background information generation means 50 stores the calculated estimated shadow area and background image in the background information storage means 42.

次に画像処理部５は、学習用データ生成手段５１として機能し、ステップＳ３にて更新された推定影領域および背景画像に基づいて学習用データを生成して学習用データ記憶手段４３に記憶・蓄積させる（ステップＳ４）。 Next, the image processing unit 5 functions as the learning data generation means 51, generates learning data based on the estimated shadow region and the background image updated in step S3, and stores the learning data in the learning data storage means 43. Accumulate (step S4).

画像処理部５は学習用データ記憶手段４３に、推定影領域に帰属する局所領域と帰属しない局所領域の学習用データが全ての特性類似領域について蓄積されているか否かを判定する（ステップＳ５）。そして、いずれかの特性類似領域に対して推定影領域に帰属する局所領域の学習用データが蓄積されていない場合、または、いずれかの特性類似領域に対して推定影領域に帰属しない局所領域の学習用データが蓄積されていない場合は、蓄積が十分でないと判定し（ステップＳ５にて「ＮＯ」の場合）、十分な蓄積がなされるまでステップＳ２～Ｓ４の処理を繰り返す。ちなみに、最新の判定用撮影画像が条件を満たせば当該画像１枚分であっても十分な蓄積と判定される。 The image processing unit 5 determines whether or not the learning data of the local region belonging to the estimated shadow region and the learning data of the local region not belonging to the estimated shadow region are accumulated in the learning data storage means 43 for all the characteristic-similar regions (step S5). .. Then, when the learning data of the local region belonging to the estimated shadow region is not accumulated for any characteristic similar region, or the local region not belonging to the estimated shadow region for any characteristic similar region If the learning data is not accumulated, it is determined that the accumulation is not sufficient (in the case of "NO" in step S5), and the processes of steps S2 to S4 are repeated until sufficient accumulation is achieved. By the way, if the latest image for determination satisfies the conditions, it is determined that even one image is sufficiently accumulated.

また、十分な蓄積であるかは、蓄積時間または蓄積データ数で判定してもよい。なお、基本的には、撮影画像の取得レートおよび各画像での取得データ数は一定であるので、蓄積時間による判定と蓄積データ数による判定とは等価となる。また、帰属する／しないデータの有無の条件と、蓄積時間または蓄積データ数の条件とを併用し、いずれかを満たせば十分な蓄積と判定してもよい。 Further, whether or not the accumulation is sufficient may be determined by the accumulation time or the number of accumulated data. Since the acquisition rate of captured images and the number of acquired data in each image are basically constant, the determination based on the accumulation time and the determination based on the number of accumulated data are equivalent. Further, the condition of presence / absence of data to be attributed / not to be attributed and the condition of accumulation time or the number of accumulated data may be used in combination, and it may be determined that sufficient accumulation is satisfied if either of them is satisfied.

学習用データが十分に蓄積された場合には（ステップＳ５にて「ＹＥＳ」の場合）、画像処理部５は撮影画像にて監視対象を検知する処理を開始する。 When the learning data is sufficiently accumulated (when "YES" in step S5), the image processing unit 5 starts the process of detecting the monitoring target in the captured image.

この監視対象検知処理では、まず本発明の学習装置に係る影判定モデル学習処理Ｓ６が行われ、次に、本発明の影検出装置に係る影判定処理Ｓ７が行われる。 In this monitoring target detection process, first, the shadow determination model learning process S6 according to the learning device of the present invention is performed, and then the shadow determination process S7 according to the shadow detection device of the present invention is performed.

図８は影判定モデル学習処理Ｓ６の概略のフロー図である。画像処理部５は学習手段５２として機能して当該処理を行う。学習手段５２は各特性類似領域を順次、注目領域に設定して（ステップＳ６０）、ステップＳ６１～Ｓ６３の処理をループ処理で全特性類似領域に対して行う（ステップＳ６４）。当該ループにて、学習手段５２は学習用データ記憶手段４３から注目領域の学習用データを読み出して(ステップＳ６１）、注目領域の影判定モデルを生成・更新し（ステップＳ６２）、学習済モデルとして学習済モデル記憶手段４４に格納する（ステップＳ６３）。 FIG. 8 is a schematic flow chart of the shadow determination model learning process S6. The image processing unit 5 functions as a learning means 52 to perform the processing. The learning means 52 sequentially sets each characteristic-similar region to the region of interest (step S60), and performs the processes of steps S61 to S63 for all the characteristic-similar regions by loop processing (step S64). In the loop, the learning means 52 reads the learning data of the attention region from the learning data storage means 43 (step S61), generates and updates the shadow determination model of the attention region (step S62), and uses it as a trained model. It is stored in the trained model storage means 44 (step S63).

学習手段５２は、このステップＳ６１～Ｓ６３の処理を全特性類似領域について終えていない場合（ステップＳ６４にて「ＮＯ」の場合）、処理をステップＳ６０に戻して未処理の特性類似領域を注目領域に設定してループ処理を繰り返す。 When the learning means 52 has not completed the processing of steps S61 to S63 for all the characteristic-similar regions (when “NO” in step S64), the processing is returned to step S60 and the unprocessed characteristic-similar regions are focused on the region of interest. Set to and repeat the loop process.

一方、全ての特性類似領域についてループ処理を終えると（ステップＳ６４にて「ＹＥＳ」の場合）、画像処理部５は処理をステップＳ７に進める。 On the other hand, when the loop processing is completed for all the characteristic-similar regions (when “YES” in step S64), the image processing unit 5 advances the processing to step S7.

図９は影判定処理Ｓ７の概略のフロー図である。画像処理部５は影判定手段５３として機能して当該処理を行う。ちなみに、ステップＳ３ではレンダリングにより影領域を推定したが、影判定処理Ｓ７では影判定モデルを用いて影領域を再推定する。影判定手段５３は撮影画像内の各局所領域を順次、注目局所領域に設定して（ステップＳ７０）、ステップＳ７１～Ｓ７６の影領域か否かの推定処理をループ処理で全局所領域に対して繰り返す（ステップＳ７７）。 FIG. 9 is a schematic flow chart of the shadow determination process S7. The image processing unit 5 functions as a shadow determination means 53 to perform the processing. Incidentally, in step S3, the shadow area is estimated by rendering, but in the shadow determination process S7, the shadow area is re-estimated using the shadow determination model. The shadow determination means 53 sequentially sets each local region in the captured image as a local region of interest (step S70), and performs estimation processing for whether or not it is a shadow region in steps S71 to S76 for all local regions by loop processing. Repeat (step S77).

まず、影判定手段５３は、注目局所領域が帰属する特性類似領域の学習済モデルを学習済モデル記憶手段４４から読み出し、当該学習済モデルを用いて注目局所領域の影度合いを算出する（ステップＳ７１）。 First, the shadow determination means 53 reads a trained model of a characteristic-similar region to which the local region of interest belongs from the trained model storage means 44, and calculates the degree of shadow of the local region of interest using the trained model (step S71). ).

影判定手段５３は注目局所領域の影度合いがしきい値Ｔ_Ｓ以上である場合（ステップＳ７２にて「ＹＥＳ」の場合）、注目局所領域に影ラベルを付与し（ステップＳ７３）、一方、当該影度合いがしきい値Ｔ_Ｎ未満である場合（ステップＳ７２にて「ＮＯ」且つステップＳ７４にて「ＹＥＳ」の場合）、非影ラベルを付与する（ステップＳ７５）。 When the shadow degree of the attention local region is equal to or higher than the threshold value _TS (when “YES” in step S72), the shadow determination means 53 assigns a shadow label to the attention local region (step S73), while the shadow determination means 53 applies the shadow label. When the degree of shadow is less than the threshold value _TN (when "NO" in step S72 and "YES" in step S74), a non-shadow label is attached (step S75).

これらラベル付与処理後に影判定手段５３は既に説明したラベル補間処理を行う（ステップＳ７６）。ラベル補間処理により、ラベルが未定の注目局所領域についてラベルが定められる。具体的には、影度合いがＴ_Ｎ以上Ｔ_Ｓ未満である場合（ステップＳ７２にて「ＮＯ」且つステップＳ７４にて「ＮＯ」の場合）にラベル補間処理が行われる。 After these label assignment processes, the shadow determination means 53 performs the label interpolation process already described (step S76). The label interpolation process determines a label for a local region of interest whose label is undecided. Specifically, the label interpolation process is performed when the degree of shadow is _TN or more and less than _TS (when "NO" in step S72 and "NO" in step S74).

影判定手段５３は、このステップＳ７１～Ｓ７６の処理を全局所領域について終えていない場合（ステップＳ７７にて「ＮＯ」の場合）、処理をステップＳ７０に戻して未処理の局所領域を注目局所領域に設定してループ処理を繰り返す。 When the shadow determination means 53 has not completed the processing of steps S71 to S76 for all the local regions (when “NO” in step S77), the processing is returned to step S70 and the unprocessed local region is the attention local region. Set to and repeat the loop process.

一方、全ての局所領域についてループ処理を終えると（ステップＳ７７にて「ＹＥＳ」の場合）、画像処理部５は処理を図７のステップＳ８に進める。 On the other hand, when the loop processing is completed for all the local regions (when “YES” in step S77), the image processing unit 5 advances the processing to step S8 in FIG. 7.

画像処理部５は前景抽出手段５４および前景情報解析手段５５として機能し、前景抽出手段５４は影判定手段５３から入力される影領域の情報を用いて、撮影画像に現れる前景物体を抽出し、前景情報を前景情報解析手段５５に出力する（ステップＳ８）。また、前景情報解析手段５５は前景抽出手段５４から入力された前景情報を解析して、解析結果を解析結果出力手段３１に出力する（ステップＳ９）。そして、解析結果出力手段３１は前景情報解析手段５５から入力された解析結果を報知部６へ出力する（ステップＳ１０）。 The image processing unit 5 functions as the foreground extraction means 54 and the foreground information analysis means 55, and the foreground extraction means 54 extracts the foreground object appearing in the captured image by using the information of the shadow area input from the shadow determination means 53. The foreground information is output to the foreground information analysis means 55 (step S8). Further, the foreground information analysis means 55 analyzes the foreground information input from the foreground extraction means 54 and outputs the analysis result to the analysis result output means 31 (step S9). Then, the analysis result output means 31 outputs the analysis result input from the foreground information analysis means 55 to the notification unit 6 (step S10).

ステップＳ２にて取得された撮影画像に対して以上の処理を終えると、処理は再びステップＳ２に戻され、新たに取得される撮影画像に対して上述した、学習用データの生成、影判定モデルの学習およびそれを用いた影判定を含むステップＳ３～Ｓ１０の処理が繰り返される。 When the above processing is completed for the captured image acquired in step S2, the processing is returned to step S2 again, and the above-mentioned training data generation and shadow determination model for the newly acquired captured image are described. The processing of steps S3 to S10 including the learning of the above and the shadow determination using the same is repeated.

［変形例］
（１）上記実施形態においては背景情報生成手段５０が環境モデルをレンダリングして特性類似領域を算出する例を示したが、背景情報生成手段５０が背景画像に対してセマンティックセグメンテーションと呼ばれる処理を適用することによって特性類似領域を算出することもできる。 [Modification example]
(1) In the above embodiment, the background information generation means 50 renders the environment model to calculate the characteristic similar region, but the background information generation means 50 applies a process called semantic segmentation to the background image. By doing so, it is also possible to calculate a characteristic-similar region.

なお、セマンティックセグメンテーションについては、例えば、"Pyramid Scene Parsing Network" Hengshuang Zhao, et al. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 や、"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs" LC Chen, et al. IEEE transactions on pattern analysis and machine intelligence 40 (4), 834-848に記されている。 For more information on semantic segmentation, see, for example, "Pyramid Scene Parsing Network" Hengshuang Zhao, et al. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 and "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution," and Fully Connected CRFs "LC Chen, et al. IEEE transactions on pattern analysis and machine intelligence 40 (4), 834-848.

その場合、記憶部４には、背景構成物の画像と監視空間に現れるであろう物体の画像とを含んだ背景・前景構成物の画像のそれぞれを、予め学習した学習済モデルを記憶させておく。そして、背景情報生成手段５０は、その学習済モデルを用いた撮影画像の探索によって、撮影画像全体を背景・前景構成物ごとの領域に区分し、区分した領域のうちの背景構成物の領域それぞれに互いに異なる反射特性ＩＤを付与することによって特性類似領域を算出する。 In that case, the storage unit 4 stores a trained model learned in advance for each of the images of the background and foreground components including the image of the background component and the image of the object that will appear in the monitoring space. back. Then, the background information generation means 50 divides the entire photographed image into areas for each background / foreground component by searching for the photographed image using the trained model, and each of the areas of the background component in the divided areas. By assigning different reflection characteristic IDs to each other, a characteristic-similar region is calculated.

（２）上記実施形態およびその各変形例では、視野が固定され、カメラパラメータが一定値であるカメラ２の例を説明したが、上記実施形態およびその変形例において、パン、チルト、ズームが可能なＰＴＺカメラのように、または車載カメラ、空撮カメラなどのように、カメラパラメータが変化するカメラ２を利用することもできる。その場合、画像処理部５はカメラパラメータの変化を検出した場合に特性類似領域を更新する。 (2) In the above embodiment and each modification thereof, an example of the camera 2 in which the field of view is fixed and the camera parameter is a constant value has been described. However, in the above embodiment and the modification thereof, panning, tilting, and zooming are possible. It is also possible to use a camera 2 whose camera parameters change, such as a PTZ camera, an in-vehicle camera, an aerial camera, or the like. In that case, the image processing unit 5 updates the characteristic-similar region when it detects a change in the camera parameter.

例えば、カメラ２が撮影時のカメラパラメータを都度算出して撮影画像とともに出力する。図７に示した処理フローのステップＳ２において、撮影画像取得手段３０は入力されたカメラパラメータを背景情報生成手段５０に出力し、背景情報生成手段５０は入力されたカメラパラメータをカメラ情報記憶手段４１に記憶されているカメラパラメータと比較して一致するか否かを判定し、一致しなければ入力されたカメラパラメータをカメラ情報記憶手段４１に上書き記憶させるとともに図７のステップＳ１と同様にして特性類似領域を算出し、算出した特性類似領域を背景情報記憶手段４２に上書き記憶させる。 For example, the camera 2 calculates the camera parameters at the time of shooting each time and outputs them together with the shot image. In step S2 of the processing flow shown in FIG. 7, the captured image acquisition means 30 outputs the input camera parameters to the background information generation means 50, and the background information generation means 50 outputs the input camera parameters to the camera information storage means 41. It is determined whether or not they match by comparing with the camera parameters stored in, and if they do not match, the input camera parameters are overwritten and stored in the camera information storage means 41, and the characteristics are the same as in step S1 of FIG. A similar area is calculated, and the calculated characteristic similar area is overwritten and stored in the background information storage means 42.

（３）上記実施形態およびその各変形例では、学習用データが１枚１枚の学習用撮影画像の局所領域から生成され、それに対応して影判定手段５３が判定用データを１枚１枚の判定用撮影画像の局所領域から抽出する例を示したが、上記実施形態および各変形例において、学習用データが局所領域における学習用撮影画像の時系列から生成され、判定用データが局所領域における判定用撮影画像の時系列から抽出されてもよい。 (3) In the above embodiment and each modification thereof, the training data is generated from the local region of each learning captured image, and the shadow determination means 53 corresponds to the determination data one by one. Although an example of extracting from the local area of the image for determination is shown, in the above embodiment and each modification, the training data is generated from the time series of the image for learning in the local area, and the judgment data is the local area. It may be extracted from the time series of the image taken for determination in.

例えば、学習用データにおける画像特徴は１０分おきの学習用撮影画像５枚の同一局所領域について抽出された５枚分の特徴ベクトルを時刻の降順に並べて連結したベクトルとすることができる。また、それに対応して、判定用データにおける画像特徴ひとつひとつは、例えば、１０分おきの判定用撮影画像５枚の同一局所領域について抽出された５枚分の特徴ベクトルを連結したベクトルとなる。 For example, the image feature in the training data can be a vector in which five feature vectors extracted from the same local region of five learning images taken every 10 minutes are arranged in descending order of time and concatenated. Correspondingly, each image feature in the determination data is, for example, a vector in which five feature vectors extracted from the same local region of five determination captured images every 10 minutes are connected.

なお、その場合、学習用影度合いと判定用影度合いもそれぞれ時系列となる。例えば、５枚の撮影画像の時系列を用いる上記例において、学習用影度合いと判定時影度合いはそれぞれ時刻の降順に５つの影度合いが並んだ５次元ベクトルとなり、最新の判定用撮影画像に対する判定時影度合いは５次元ベクトルの先頭の値となる。 In that case, the degree of shadow for learning and the degree of shadow for determination are also in chronological order. For example, in the above example using the time series of five captured images, the shadow degree for learning and the shadow degree for determination are five-dimensional vectors in which the five shadow degrees are arranged in descending order of time, respectively, with respect to the latest captured image for determination. The degree of shadow at the time of determination is the first value of the five-dimensional vector.

このように画像特徴および影度合いを時系列データとすることで、影から非影、非影から影への時間変動も含めて学習し、判定することができるため、より高精度に影を学習し、より高精度に影領域を判定することが可能となる。当該構成は長期間にわたる学習用データの蓄積が許容される場合に有用である。 By using the image features and the degree of shadow as time-series data in this way, it is possible to learn and judge including the time variation from shadow to non-shadow and from non-shadow to shadow, so shadow learning can be performed with higher accuracy. However, it becomes possible to determine the shadow area with higher accuracy. This configuration is useful when the accumulation of learning data over a long period of time is acceptable.

（４）上記実施形態及びその各変形例では、学習用撮影画像に最新の撮影画像を含める例を示したが、同一視野にて長期間の蓄積がなされた場合は影判定モデルの更新を止めて最新の撮影画像を含めない構成とすることもできる。その場合、例えば、ステップＳ５で蓄積時間に対して用いるしきい値をＴ_１とするとＴ_１よりも十分に大きなしきい値Ｔ_２を用い、学習用データ生成手段５１は蓄積時間がＴ_２以上であれば学習用データの生成を停止し、学習手段５２は、蓄積時間がＴ_２以上であれば影判定モデルの更新を停止する。 (4) In the above-described embodiment and each modification thereof, an example in which the latest captured image is included in the captured image for learning is shown, but if long-term accumulation is performed in the same visual field, the update of the shadow determination model is stopped. It is also possible to configure the configuration so that the latest captured image is not included. In that case, for example, if the threshold value used for the accumulation time in step S5 is T ₁ , a threshold value T ₂ that is sufficiently larger than T ₁ is used, and the learning data generation means 51 has an accumulation time of T ₂ or more. If so, the generation of learning data is stopped, and if the accumulation time is T ₂ or more, the learning means 52 stops updating the shadow determination model.

（５）上記実施形態およびその各変形例では、影判定手段５３は特性類似領域ごとの学習済モデルのうち局所領域が帰属する特性類似領域のものを一意的に選択してこれに当該局所領域の画像特徴を入力する例を示した。この点に関し、上記実施形態および各変形例において、影判定手段５３が学習済モデルを択一的に選択せずに、局所領域に対し複数の特性類似領域の学習済モデルを用いて判定してもよい。 (5) In the above-described embodiment and each modification thereof, the shadow determination means 53 uniquely selects a trained model for each characteristic-similar region and which has a characteristic-similar region to which the local region belongs. An example of inputting the image features of is shown. Regarding this point, in the above-described embodiment and each modification, the shadow determination means 53 does not selectively select the trained model, but determines using the trained models of a plurality of characteristic-similar regions with respect to the local region. May be good.

例えば、影判定手段５３は、判定用撮影画像における局所領域の画像特徴を全ての学習済モデルに入力して当該局所領域について複数の判定用影度合いを得、判定用影度合いの最大値を影判定閾値と比較する。このようにすれば、特性類似領域の設定誤差があっても高精度な影判定が可能となる。 For example, the shadow determination means 53 inputs the image features of the local region in the captured image for determination into all the trained models, obtains a plurality of determination shadow degrees for the local region, and sets the maximum value of the determination shadow degree as a shadow. Compare with the judgment threshold. By doing so, it is possible to perform highly accurate shadow determination even if there is a setting error in the characteristic-similar region.

また、全ての学習済モデルに入力する代わりに、当該局所領域が帰属する特性類似領域および当該特性類似領域に隣接する特性類似領域についての学習済モデルに入力してもよい。このようにしても、特性類似領域の設定誤差があっても高精度な影判定が可能となる。 Further, instead of inputting to all the trained models, it may be input to the trained model for the characteristic-similar region to which the local region belongs and the characteristic-similar region adjacent to the characteristic-similar region. Even in this way, highly accurate shadow determination is possible even if there is a setting error in the characteristic-similar region.

（６）上記実施形態およびその各変形例では、特性類似領域ごとに影判定モデルを学習し、特性類似領域ごとの学習済モデルを用いて影判定を行う例を示したが、上記実施形態および各変形例において、全特性類似領域に共通の影判定モデルを１つ学習し、当該１つの学習済モデルを用いて影判定を行うこともできる。 (6) In the above-described embodiment and each modification thereof, an example in which a shadow determination model is learned for each characteristic-similar region and shadow determination is performed using the trained model for each characteristic-similar region is shown. In each modification, one shadow determination model common to all characteristic similar regions can be learned, and the shadow determination can be performed using the one learned model.

その場合の学習済モデルは、局所領域の画像特徴および当該局所領域が帰属する特性類似領域についての反射特性の情報（例えば反射特性ＩＤ）が入力されて当該局所領域に対する影度合いを出力する学習済モデルとするのが好適である。それに対応して、学習手段５２は学習用撮影画像における局所領域の画像特徴および当該局所領域が帰属する特性類似領域についての反射特性の情報を影判定モデルに入力して得られる影度合いを学習用影度合いに近づける更新を当該影判定モデルに対して行うことで学習済モデルを生成する。一方、影判定手段５３は判定用撮影画像における局所領域の画像特徴および当該局所領域が帰属する特性類似領域についての反射特性の情報を学習済モデルに入力して得られる影度合いに基づいて当該局所領域が影領域であるか否かの判定を行う。 In that case, the trained model is trained in which the image feature of the local region and the information of the reflection characteristic (for example, the reflection characteristic ID) for the characteristic similar region to which the local region belongs are input and the degree of shadow for the local region is output. It is preferable to use it as a model. Correspondingly, the learning means 52 for learning the degree of shadow obtained by inputting the image characteristics of the local region in the captured image for learning and the information of the reflection characteristics for the characteristic similar region to which the local region belongs into the shadow determination model. A trained model is generated by updating the shadow determination model to bring it closer to the degree of shadow. On the other hand, the shadow determination means 53 inputs the image characteristics of the local region in the image for determination and the reflection characteristic information of the characteristic similar region to which the local region belongs into the trained model, and the local region is based on the degree of shadow obtained. It is determined whether or not the area is a shadow area.

以上で説明した画像監視装置１においては、影判定手段５３が、撮影画像の局所領域の画像特徴が入力されると学習に基づいて当該局所領域の影度合いを出力する学習済モデルを用いて撮影画像の影判定を行う。そのため、背景が複雑な監視空間を撮影した撮影画像における影領域を高精度に判定できる。 In the image monitoring device 1 described above, the shadow determination means 53 takes a picture using a trained model that outputs the degree of shadow of the local area based on learning when the image feature of the local area of the captured image is input. Performs shadow judgment on the image. Therefore, it is possible to determine with high accuracy the shadow area in the captured image obtained by capturing the surveillance space having a complicated background.

特に、画像監視装置１においては、影判定手段５３が、特性類似領域ごとに学習された学習済モデルを用いて撮影画像の影判定を行う。これにより、背景が複雑な監視空間を撮影した撮影画像（互いに反射特性が異なる複数の背景構成物からなる背景を撮影した撮影画像）であっても、特性類似領域においては略単一の反射特性の背景となるため、高精度な影判定が可能となる。 In particular, in the image monitoring device 1, the shadow determination means 53 determines the shadow of the captured image using the trained model learned for each characteristic-similar region. As a result, even if the captured image is a photographed image of a surveillance space having a complicated background (a photographed image of a background composed of a plurality of background components having different reflection characteristics from each other), a substantially single reflection characteristic is obtained in a characteristic-similar region. Since it becomes the background of, highly accurate shadow judgment is possible.

また、画像監視装置１においては、学習手段５２によって上記モデルを学習することができるので、背景が複雑な監視空間を撮影した撮影画像における影領域を高精度に判定可能な学習済モデルを生成することができる。 Further, in the image monitoring device 1, since the above model can be learned by the learning means 52, a trained model capable of accurately determining the shadow region in the captured image obtained by photographing the surveillance space having a complicated background is generated. be able to.

特に、画像監視装置１においては、学習手段５２が、特性類似領域ごとに学習を行う。これにより、背景が複雑な監視空間を撮影した撮影画像であっても、特性類似領域においては略単一の反射特性の背景となるため、高精度な影判定が可能な学習済モデルを生成できる。 In particular, in the image monitoring device 1, the learning means 52 learns for each characteristic-similar region. As a result, even if the background is a photographed image of a complicated surveillance space, the background has substantially a single reflection characteristic in the characteristic-similar region, so that a trained model capable of highly accurate shadow determination can be generated. ..

１画像監視装置、２カメラ、３通信部、４記憶部、５画像処理部、６報知部、３０撮影画像取得手段、３１解析結果出力手段、４０環境モデル記憶手段、４１カメラ情報記憶手段、４２背景情報記憶手段、４３学習用データ記憶手段４３、４４学習済モデル記憶手段、５０背景情報生成手段、５１学習用データ生成手段、５２学習手段、５３影判定手段、５４前景抽出手段、５５前景情報解析手段。 1 image monitoring device, 2 cameras, 3 communication units, 4 storage units, 5 image processing units, 6 notification units, 30 captured image acquisition means, 31 analysis result output means, 40 environment model storage means, 41 camera information storage means, 42. Background information storage means, 43 Learning data storage means 43, 44 Learned model storage means, 50 Background information generation means, 51 Learning data generation means, 52 Learning means, 53 Shadow determination means, 54 Foreground extraction means, 55 Foreground information Analytical means.

Claims

A shadow detection device that detects a shadow area in which a shadow is captured in a captured image captured in a predetermined space.
This is a trained model in which the image features in the local area set in the captured image are input and the degree of shadow indicating the high possibility that a shadow is captured in the local area is output, and the space is photographed. A trained model memory that stores a trained model that has been trained using the image features in the local region set in the captured image for training and the training shadow degree that is the estimated shadow degree for the local region. Means and
A shadow obtained by inputting an image feature in the local region of the captured image into the trained model is compared with a predetermined reference, and a local region in which the shadow degree exceeds the reference is determined to be the shadow region. Judgment means and
A shadow detection device characterized by being equipped with.

The trained model storage means stores the trained model for each characteristic-similar region in which the training is performed for each characteristic-similar region in which the reflection characteristics of the background component that can be captured in the captured image are similar. The shadow detection device according to claim 1, wherein the shadow detection device is characterized by.

Further provided with a background information storage means for storing the characteristic-similar region,
The shadow determination means inputs the image feature of the local region into the trained model of the characteristic-like region corresponding to the local region to obtain the shadow degree.
The shadow detection device according to claim 2.

Further, a background information storage means for storing a characteristic-similar region having similar reflection characteristics of a background component that can be captured in the captured image is provided.
In addition to the image features and the degree of shadow for learning about the local region set in the captured image for learning, the trained model storage means also includes attribution information indicating the characteristic-similar region to which the local region belongs. The trained model in which the training was performed is stored using the trained model.
The shadow determination means inputs the image feature of the local region and the attribution information about the local region into the trained model to obtain the shadow degree.
The shadow detection device according to claim 1.

A learning device that trains a learning model that inputs image features in a local area set in a captured image captured in a predetermined space and outputs a degree of shadow that indicates the high possibility that a shadow is captured in the local area. And,
A learning model storage means for storing the learning model and
At least, a learning data storage means that stores an image feature in a local region set in a learning image obtained by capturing the space and a learning shadow degree that is the estimated shadow degree for the local region.
At least the image features of the local region in the captured image for learning are input to the learning model, and learning to update the learning model based on the error of the obtained shadow degree with respect to the learning shadow degree of the local region is performed. The learning method to be performed and
A learning device characterized by being equipped with.

The learning model storage means stores the learning model for each characteristic-similar region in which the reflection characteristics of the background components that can be captured in the captured image are similar.
The learning means performs the learning about the learning model for each characteristic-similar region.
The learning device according to claim 5.

It is a shadow detection method that detects a shadow area where a shadow is captured in a captured image captured in a predetermined space.
This is a trained model in which the image features in the local area set in the captured image are input and the degree of shadow indicating the high possibility that a shadow is captured in the local area is output, and the space is photographed. Using a trained model that has been trained using the image features in the local region set in the captured image for training and the shadow degree for learning, which is the estimated shadow degree for the local region, is used.
An input step of inputting an image feature in the local region of the captured image into the trained model, and
A shadow determination step in which the shadow degree output from the trained model is compared with a predetermined reference, and a local region in which the shadow degree exceeds the reference is determined to be the shadow region.
A shadow detection method comprising.

It is a program for causing a computer to perform a process of detecting a shadow area in which a shadow is photographed in a photographed image in which a predetermined space is photographed.
This is a trained model in which the image features in the local area set in the captured image are input and the degree of shadow indicating the high possibility that a shadow is captured in the local area is output, and the space is photographed. A trained model memory that stores a trained model that has been trained using the image features in the local region set in the captured image for training and the training shadow degree that is the estimated shadow degree for the local region. Means and
A shadow obtained by inputting an image feature in the local region of the captured image into the trained model is compared with a predetermined reference, and a local region in which the shadow degree exceeds the reference is determined to be the shadow region. Judgment means,
A shadow detection program characterized by functioning as.

A learning method that trains a learning model that inputs image features in a local area set in a captured image captured in a predetermined space and outputs a degree of shadow that indicates the high possibility that a shadow is captured in the local area. And,
A learning model storage means for storing the learning model and
At least, a learning data storage means for storing an image feature in a local region set in a learning image obtained by photographing the space and a learning shadow degree which is the shadow degree estimated for the local region. Use,
An input step of inputting at least the image features of the local region in the captured image for learning into the learning model, and
A learning step of learning to update the learning model based on an error of the shadow degree output from the learning model with respect to the learning shadow degree of the local region, and a learning step.
A learning method characterized by including.

A process of training a learning model that inputs an image feature in a local area set in a captured image taken in a predetermined space and outputs a degree of shadow indicating the high possibility that a shadow is captured in the local area. It is a program to make a computer do, and the computer is
A learning model storage means for storing the learning model,
At least, a learning data storage means that stores an image feature in a local region set in a learning image obtained by capturing the space and a learning shadow degree that is the estimated shadow degree for the local region, and a learning data storage means.
At least the image features of the local region in the captured image for learning are input to the learning model, and learning to update the learning model based on the error of the obtained shadow degree with respect to the learning shadow degree of the local region is performed. Learning means to do,
A learning program characterized by functioning as.