JP2019012328A

JP2019012328A - Person action estimation system

Info

Publication number: JP2019012328A
Application number: JP2017127426A
Authority: JP
Inventors: 和久井　一則; Kazunori Wakui; 一則和久井; 泰輔加納; Taisuke Kano; 博章三沢; Hiroaki Misawa
Original assignee: Hitachi Industry and Control Solutions Co Ltd
Current assignee: Hitachi Industry and Control Solutions Co Ltd
Priority date: 2017-06-29
Filing date: 2017-06-29
Publication date: 2019-01-24
Anticipated expiration: 2037-06-29
Also published as: JP6783713B2

Abstract

To solve the problem in which operation recognition of a person with video analysis often has difficulty in accurate action estimation of the person due to a dead zone frequently generated in a video.SOLUTION: A person action estimation system 1 for distinguishing actions carried out by persons using tools comprises: a person action distinction unit 15 which outputs person action candidates for an action captured in a video from a predetermined person action definition 12; a tool operation distinction unit 25 which outputs, on the basis of sensor information from a tool 101, tool operation candidates for the tool from which the sensor information is acquired from a predetermined tool operation definition 22; and a comprehensive person action distinction unit 2 which estimates the action captured in the video on the basis of the person action candidates and the tool operation candidates.SELECTED DRAWING: Figure 1

Description

本発明は、人物の行動推定システムに関する。 The present invention relates to a human behavior estimation system.

特許文献１には、「人物行動判定装置は、映像に含まれる１以上の人物領域を所定のフレーム間隔で機械学習により検出する人物領域検出手段と、人物領域毎に特徴量を算出すると共に、複数のフレーム画像において人物領域の特徴量が類似する人物領域を同一人物の人物領域と判定し、同一人物の人物領域の重心位置を連結して人物軌跡を生成する人物軌跡生成手段と、人物軌跡毎に特徴量を算出すると共に、人物軌跡の特徴量が行動条件を満たすか否かを判定し、人物軌跡の特徴量が行動条件を満たすときは、人物が行動条件に対応する行動を行っていると判定する人物行動判定手段と、を備える。」と記載されている。 Patent Document 1 states that “a person action determination device calculates a feature amount for each person area by means of person area detection means for detecting one or more person areas included in a video by machine learning at a predetermined frame interval, A human trajectory generating means for determining a human area having a similar feature amount of the human area in a plurality of frame images as a human area of the same person, and generating a human trajectory by connecting the gravity center positions of the human areas of the same person; The feature amount is calculated every time, and it is determined whether or not the feature amount of the person locus satisfies the action condition. When the feature amount of the person locus satisfies the action condition, the person performs an action corresponding to the action condition. And a human action determination unit that determines that the user is present. "

特開２０１１−１００１７５号公報JP 2011-1000017 A

映像解析による人物の動作認識は、映像中から人物領域を切り出し、その人物の輪郭を正確に取得し、人物を正確に抽出できていることが前提となっている。しかし、人物が抽出できたとしても、人の行動は、カメラの位置、手の位置、体の位置などにより死角が生じてしまい、正確な人物の行動推定が困難である。 Human motion recognition by video analysis is based on the premise that a person region is cut out from video, the outline of the person is accurately acquired, and the person can be extracted accurately. However, even if a person can be extracted, a blind spot occurs in the human action due to the position of the camera, the position of the hand, the position of the body, etc., and it is difficult to accurately estimate the action of the person.

特許文献１では、人物軌跡の特徴量が人物の行動毎に予め設定された行動条件を満たすか否かにより、人物の行動を判定する人物行動判定装置について記載されている。しかし、死角に隠れた行動に行動条件を設定することは難しく、このような場合に人物の行動を正確に判定するためには情報が不足していると考えられる。 Patent Document 1 describes a person behavior determination device that determines a person's behavior based on whether or not the feature amount of the person trajectory satisfies a behavior condition set in advance for each person's behavior. However, it is difficult to set an action condition for an action hidden in the blind spot, and in such a case, it is considered that information is insufficient to accurately determine the action of the person.

このために、特許請求の範囲に記載の構成を採用する。例えば、本発明に係る、人が道具を用いて行う行動を判別する人行動推定システムは、行動を撮影した映像を取得する映像取得部と、映像取得部からの映像に基づき、あらかじめ定められた人行動定義から、映像に撮影された行動に対しての人行動候補を出力する人行動判別部と、道具に付されたセンサからセンサ情報を取得する道具データ取得部と、道具データ取得部からのセンサ情報に基づき、あらかじめ定められた道具動作定義から、センサ情報が取得された道具に対しての道具動作候補を出力する道具動作判別部と、人行動判別部より出力された人行動候補及び道具動作判別部より出力された道具動作候補に基づき、映像取得部からの映像に撮影された行動を推定する総合人行動判別部とを有する。 For this purpose, the configuration described in the claims is adopted. For example, according to the present invention, a human behavior estimation system that discriminates a behavior performed by a person using a tool is determined in advance based on a video acquisition unit that acquires a video of the action and a video from the video acquisition unit. From a human action definition, a human action discriminating section that outputs human action candidates for actions taken in the video, a tool data acquiring section that acquires sensor information from a sensor attached to the tool, and a tool data acquiring section Based on the sensor information of the tool, a tool motion discrimination unit that outputs tool motion candidates for the tool for which sensor information has been acquired from a predetermined tool motion definition, and a human behavior candidate output from the human behavior discrimination unit and And a general human action discriminating unit for estimating an action photographed in the video from the video acquiring unit based on the tool motion candidate output from the tool motion discriminating unit.

行動推定対象とする人の行動を、人を撮影している映像からの行動推定と、人が使用している工具の動作推定とを用いることで、精度よく実現することができる。 The action of the person to be the action estimation target can be realized with high accuracy by using the action estimation from the video of photographing the person and the motion estimation of the tool used by the person.

人物行動推定システムの処理フローを示す図である。It is a figure which shows the processing flow of a person action estimation system. 監視空間の領域区分例である。It is an example of area division of surveillance space. 映像の人行動情報の定義（学習データ）を示す図である。It is a figure which shows the definition (learning data) of the human action information of an image | video. 工具（センサ）データの工具動作情報の定義（学習データ）を示す図である。It is a figure which shows the definition (learning data) of the tool operation information of tool (sensor) data. 総合人行動判別部のフローチャートを示す図である。It is a figure which shows the flowchart of an integrated person action determination part. 監視空間の映像例である。It is an example of an image of a surveillance space. 監視空間の別の映像例である。It is another example of a picture of surveillance space. 監視空間の別の映像例である。It is another example of a picture of surveillance space. 人物行動推定結果と工具動作推定結果とのマッピング図である。It is a mapping figure of a person action estimation result and a tool operation estimation result. 人物行動推定システムを実現するハードウェア構成を示す図である。It is a figure which shows the hardware constitutions which implement | achieve a person action estimation system.

図１に人行動推定システム１の処理フローを示す。人行動推定システム１は一般的なＰＣ（Personal Computer）などのローカル環境で実現しても、クラウドのようなネットワーク経由で実現しても、どちらでもよい。 FIG. 1 shows a processing flow of the human behavior estimation system 1. The human behavior estimation system 1 may be realized in a local environment such as a general PC (Personal Computer) or may be realized via a network such as a cloud.

図８に人行動推定システム１を実現するハードウェア構成例を示す。計算機８００は、プロセッサ８０１、主記憶８０２、補助記憶８０３、入出力インタフェース８０４、表示インタフェース８０５、ネットワークインタフェース８０６を含み、これらはバス８０７により結合されている。入出力インタフェース８０４は、キーボードやマウス等の入力装置８０９と接続されてユーザインタフェースを提供する。表示インタフェース８０５は、ディスプレイ８０８に接続される。ネットワークインタフェース８０６は計算機８００と外部ネットワーク（図示せず）とを接続するためのインタフェースである。 FIG. 8 shows a hardware configuration example for realizing the human behavior estimation system 1. The computer 800 includes a processor 801, a main memory 802, an auxiliary memory 803, an input / output interface 804, a display interface 805, and a network interface 806, which are coupled by a bus 807. The input / output interface 804 is connected to an input device 809 such as a keyboard and a mouse to provide a user interface. The display interface 805 is connected to the display 808. A network interface 806 is an interface for connecting the computer 800 and an external network (not shown).

補助記憶８０３は通常、ＨＤＤやフラッシュメモリなどの不揮発性メモリで構成され、計算機８００が実行するプログラムやプログラムが処理対象とするデータ等を記憶する。主記憶８０２はＲＡＭで構成され、プロセッサ８０１の命令により、プログラムやプログラムの実行に必要なデータ等を一時的に記憶する。プロセッサ８０１は、補助記憶８０３から主記憶８０２にロードしたプログラムを実行する。 The auxiliary storage 803 is generally composed of a nonvolatile memory such as an HDD or a flash memory, and stores a program executed by the computer 800, data to be processed by the program, and the like. The main memory 802 is composed of a RAM, and temporarily stores a program, data necessary for executing the program, and the like according to instructions from the processor 801. The processor 801 executes a program loaded from the auxiliary memory 803 to the main memory 802.

人行動推定システム１の処理ブロックのそれぞれはプログラムとして補助記憶８０３に格納されており、補助記憶８０３から主記憶８０２にロードされ、プロセッサ８０１により実行される。また、図１に特定のデータの格納するデータベースについても、それぞれ補助記憶８０３に記憶され、あるいは補助記憶８０３から主記憶８０２に呼び出されて処理がなされる。以下では、一応用例として、製造ラインの作業員が正しく部品の取り付けや加工を行っているかを監視する監視システムに人行動推定システム１を適用した場合を例にとって説明する。 Each processing block of the human behavior estimation system 1 is stored in the auxiliary memory 803 as a program, loaded from the auxiliary memory 803 to the main memory 802 and executed by the processor 801. 1 are also stored in the auxiliary memory 803 or called from the auxiliary memory 803 to the main memory 802 for processing. Hereinafter, as an application example, a case where the human behavior estimation system 1 is applied to a monitoring system that monitors whether a worker on a production line is correctly mounting or processing a part will be described.

カメラ装置２００は行動推定対象の人（この例では製造ラインの作業者）を撮影する。映像取得部１０はカメラ装置２００で撮影された映像（動画像）を取得し、映像情報蓄積部１１に格納する。システムではあらかじめ判別したい人の行動を人行動定義１２として定義しておく。例えば、図３のテーブル４００では、「人」「領域」「動作」の３カテゴリについて、人の行動を定義する例を示している。 The camera device 200 photographs a person who is an object of behavior estimation (in this example, a worker on the production line). The video acquisition unit 10 acquires a video (moving image) captured by the camera device 200 and stores it in the video information storage unit 11. In the system, an action of a person to be discriminated in advance is defined as a human action definition 12. For example, the table 400 of FIG. 3 shows an example in which human behavior is defined for three categories of “person”, “region”, and “motion”.

「人」カテゴリは、行動推定対象の人が誰かを定義する。この例では「人Ａ」、「人Ｂ」が定義されている。「領域」カテゴリは、行動推定対象の人が撮影された映像のどの領域において行動（作業）しているかを定義する。これは、製造ラインの作業員がどの作業をどこで行うか、およそ定まっているため、行動推定対象の人がどこで行動しているかは行動推定に重要な情報であるためである。このため、図２のようにカメラ装置２００によって撮影される映像空間を区分し（この例では３×３の９領域に区分している）、行動推定対象の人が映っている位置を定義する。なお、単純化のためカメラ装置２００を固定として、映像空間により領域を区分する例で本実施例は説明するが、カメラ装置２００がステレオカメラであれば、３次元空間で定義してもよい。さらに、現実空間の領域を区分し、公知の映像処理技術により、映像から行動推定対象の人がどの現実空間の領域にいるか解析してもよい。「動作」カテゴリは、行動推定対象の人が何の行動（作業）をしているかを定義する。この例では「ねじをしめる」、「穴をあける」、「ねじをゆるめる」といった作業が定義されている。この定義はカメラ装置２００で撮影された映像（動画像）から判別したい内容にしたがって定めればよい。「動作」カテゴリは作業マニュアル等により作業者が製造ラインにおいて実行する作業であって監視システムにおいて判別したい作業を洗い出して定めることができる。なお、これらの全てのカテゴリについて定義することを要求するものではなく、行動推定対象の人が誰であるか特定不要であれば、「人」カテゴリを定義する必要はない。あるいは、例えば、作業者の服装や装備のように、必要に応じて別の定義カテゴリを設けてもよい。 The “person” category defines who is the target of behavior estimation. In this example, “person A” and “person B” are defined. The “region” category defines in which region of the video where the person who is the target of behavior estimation is photographed (working). This is because what work is performed by the worker on the production line is roughly determined, and where the person who is the object of the action estimation is acting is important information for action estimation. For this reason, as shown in FIG. 2, the video space photographed by the camera device 200 is divided (in this example, divided into 9 areas of 3 × 3), and the position where the person whose behavior is to be estimated is defined. . For simplicity, the camera apparatus 200 is fixed, and the present embodiment is described as an example in which areas are divided by video space. However, if the camera apparatus 200 is a stereo camera, it may be defined in a three-dimensional space. Further, the real space region may be divided and the real space region where the person whose behavior is to be estimated is determined from the video by a known video processing technique. The “motion” category defines what action (work) the person who is the object of action estimation is performing. In this example, operations such as “tighten a screw”, “drill a hole”, and “loosen a screw” are defined. This definition may be determined according to the content to be discriminated from the video (moving image) captured by the camera device 200. The “operation” category can be determined by identifying work to be discriminated in the monitoring system, which is performed by the worker on the production line by a work manual or the like. Note that it is not required to define all of these categories, and it is not necessary to define the “person” category if it is not necessary to specify who the person whose behavior is to be estimated. Or you may provide another definition category as needed like an operator's clothes and equipment, for example.

一方、行動推定対象とする人が用いる工具であるドライバ１０１ａ、錐１０１ｂには、振動センサや加速度センサなどのセンサ１０３ａ，ｂが取り付けられている。センサの種類は特に限定されず、工具ごとに異なっていても、また複数のセンサが設けられていても構わない。工具データ取得部２０は、工具１０１が使用される位置を示す位置情報及び、工具１０１に取り付けられたセンサ１０３からのセンサデータやセンサデータを加工した情報もしくは、工具の出力情報を取得し、工具データ蓄積部２１へ格納する。例えば、ドライバ１０１ａに加速度センサ１０３ａが取り付けられている場合、ねじをしめる動作時に取得した加速度データや加工した軌跡データを蓄積する。また、工具１０１が使用される位置情報については工具から取得しても、映像取得部１０で取得された映像から検出するようにしてもよい。システムではあらかじめ判別したい人の行動に伴う工具動作を工具動作定義２２として定義しておく。例えば、図４のテーブル５００では、「工具」「領域」「動作」の３カテゴリについて、人の行動を定義する例を示している。 On the other hand, sensors 103a and 103b such as a vibration sensor and an acceleration sensor are attached to a driver 101a and a cone 101b which are tools used by a person who is a target of behavior estimation. The type of sensor is not particularly limited, and may be different for each tool, or a plurality of sensors may be provided. The tool data acquisition unit 20 acquires position information indicating a position where the tool 101 is used, sensor data from the sensor 103 attached to the tool 101, information obtained by processing the sensor data, or tool output information. Store in the data storage unit 21. For example, when the acceleration sensor 103a is attached to the driver 101a, the acceleration data acquired during the screwing operation and the processed trajectory data are accumulated. Further, the position information where the tool 101 is used may be acquired from the tool or detected from the video acquired by the video acquisition unit 10. In the system, a tool motion associated with a human action to be discriminated in advance is defined as a tool motion definition 22. For example, the table 500 of FIG. 4 shows an example in which human behavior is defined for three categories of “tool”, “region”, and “motion”.

「工具」カテゴリは、行動推定対象の人が使用する工具を定義する。この例では「ドライバ」、「錐」が定義されている。「領域」カテゴリは、工具が用いられている領域を定義する。これは、製造ラインの作業員がどの作業をどこで行うか、およそ定まっているため、工具が用いられる場所もそれに伴って限定されることによる。領域は人行動定義と同様に定めることができ、人行動定義と同じ領域定義をしてもよいし、異なる領域定義をしてもよい。図４の例では同じ領域定義をしている。「動作」カテゴリは、工具で行われる動作を定義する。この例ではドライバであれば「ねじをしめる」、「ねじをゆるめる」、錐であれば「穴をあける」といった動作が定義されている。 The “tool” category defines a tool used by a person whose behavior is to be estimated. In this example, “driver” and “cone” are defined. The “region” category defines the region in which the tool is used. This is due to the fact that where the workers on the production line perform what operations and where, the places where the tools are used are limited accordingly. The area can be defined in the same manner as the human action definition, and the same area definition as the human action definition or a different area definition may be used. In the example of FIG. 4, the same area definition is made. The “Action” category defines actions performed on the tool. In this example, an operation such as “tighten a screw”, “loosen a screw” for a driver, and “open a hole” for a cone is defined.

人行動学習部１３では、まずカメラ装置２００から取得し、映像情報蓄積部１１に蓄積された映像を人行動定義１２に基づき定義する。これが学習データとなる。図３に示すテーブル４００は学習データの例であり、例えば、レコード４０１は映像「a.mpeg」は「人Ａ」が領域「X1Y1及びX1Y2」において「ねじをしめる」動作をおこなっている映像であると定義するものである。学習データ４００により人の行動を判別するモデルをつくる。例えば、ディープラーニング等の機械学習を用いて、カメラ装置２００から取得した映像から人の行動を判別するモデルを作成する。人行動学習結果であるモデルは、人行動学習結果蓄積部１４に保存される。 The human behavior learning unit 13 first defines the video acquired from the camera device 200 and stored in the video information storage unit 11 based on the human behavior definition 12. This becomes learning data. A table 400 shown in FIG. 3 is an example of learning data. For example, a record 401 is an image “a.mpeg” in which “person A” performs an operation of “screwing” in an area “X1Y1 and X1Y2”. It is defined as being. A model for discriminating human behavior from the learning data 400 is created. For example, a model for discriminating human behavior from a video acquired from the camera device 200 is created using machine learning such as deep learning. A model that is a human behavior learning result is stored in the human behavior learning result accumulation unit 14.

工具動作学習部２３では、まず工具１０１に取り付けられたセンサ１０３から取得し、工具データ蓄積部２１に蓄積された工具データを工具動作定義２２に基づき定義する。これが学習データとなる。図４に示すテーブル５００は学習データの例であり、例えば、レコード５０１は、工具データ「a.csv」は「ドライバ」が領域「X1Y3」において「ねじをしめる」動作をおこなっている工具データであると定義するものである。なお、工具に取り付けられたセンサが加速度センサであれば、工具データ「a.csv」とは、検出した加速度の時系列データやそれを加工した特徴量データ（テキストデータファイル）である。学習データ５００により工具動作を判別するモデルをつくる。例えば、ディープラーニング等の機械学習を用いて、工具に取り付けられたセンサ１０３から取得した工具データから工具動作を判別するモデルを作成する。工具動作学習結果であるモデルは、工具動作学習結果蓄積部２４に保存される。 The tool motion learning unit 23 first defines the tool data acquired from the sensor 103 attached to the tool 101 and stored in the tool data storage unit 21 based on the tool motion definition 22. This becomes learning data. A table 500 illustrated in FIG. 4 is an example of learning data. For example, a record 501 includes tool data “a.csv” that is tool data in which a “driver” performs a “screw tightening” operation in an area “X1Y3”. It is defined as being. If the sensor attached to the tool is an acceleration sensor, the tool data “a.csv” is time-series data of detected acceleration and feature data (text data file) obtained by processing it. A model for discriminating tool movement is created from the learning data 500. For example, using machine learning such as deep learning, a model for discriminating the tool operation from the tool data acquired from the sensor 103 attached to the tool is created. The model that is the tool motion learning result is stored in the tool motion learning result storage unit 24.

人行動判別部１５は、人行動学習結果蓄積部１４に保存されたモデルを映像取得部１０からの映像に適用して、人の行動を推定して人の行動候補を出力する。ここでは複数の行動候補を出力することを許容する。同様に、工具動作判別部２５は、工具動作学習結果蓄積部２４に保存されたモデルを工具データ取得部２０からの工具データに適用して、工具動作を推定して工具の動作候補を出力する。ここでは複数の動作候補を出力することを許容する。 The human behavior determination unit 15 applies the model stored in the human behavior learning result accumulation unit 14 to the video from the video acquisition unit 10, estimates the human behavior, and outputs a human behavior candidate. Here, it is allowed to output a plurality of action candidates. Similarly, the tool motion determination unit 25 applies the model stored in the tool motion learning result storage unit 24 to the tool data from the tool data acquisition unit 20, estimates the tool motion, and outputs a tool motion candidate. . Here, it is allowed to output a plurality of motion candidates.

図５に総合人行動判別部２のフローチャートを示す。まず、工具動作判別部２５の結果の有無を判定する（Ｓ５１）。工具動作判別部２５からの工具動作候補がない場合は人行動判別部１５の人の行動候補を結果として出力する。工具動作判別部２５の工具動作候補がある場合、人行動判別部１５の人の行動候補と工具動作判別部２５の工具動作候補を比較して一致判定する（Ｓ５２）。人行動判別部１５からの人の行動候補と工具動作判別部２５からの工具動作候補とが一致する場合は、一致したものを人の行動判別結果として出力する。不一致の場合、人の行動候補と工具の動作候補から人の行動を推定して、推定結果を出力する（Ｓ５３）。 FIG. 5 shows a flowchart of the general human behavior determination unit 2. First, the presence / absence of a result of the tool motion determination unit 25 is determined (S51). When there is no tool motion candidate from the tool motion determination unit 25, the human behavior candidate of the human behavior determination unit 15 is output as a result. When there is a tool motion candidate of the tool motion determining unit 25, the human behavior candidate of the human behavior determining unit 15 and the tool motion candidate of the tool motion determining unit 25 are compared to determine a match (S52). When the human action candidate from the human action determining unit 15 and the tool action candidate from the tool operation determining unit 25 match, the match is output as a human action determination result. If they do not match, the human behavior is estimated from the human behavior candidate and the tool motion candidate, and the estimation result is output (S53).

総合人行動判別部２は、映像のみからでは行動推定対象の人の行動を判定することが困難であることが多いことから、工具からの動作情報により判定精度を高めるものである。例えば、図６Ａは、「人Ａ」が、「ドライバ１０１ａ」で領域「X1Y1, X1Y2」において「ねじをしめる」状況を映した映像の一シーンである。同様に、図６Ｂは、「人Ａ」が、「錐１０１ｂ」で領域「X2Y1, X2Y2」において「穴をあける」状況を映した映像の一シーンである。また同様に、図６Ｃは、「人Ｂ」が、「ドライバ１０１ａ」で領域「X2Y1, X2Y2」において「ねじをゆるめる」状況を映した映像の一シーンである。これらは、人が目視で行動を判定する場合でも、類似の映像であるため判定が難しいものである。まして、画像処理による行動判定では、これらの類似行動を正確に判別することは難しく、さらに、重要な行動判定のもととなる工具は人の影に隠れてしまい、映像から常時確認することが難しい。 The general human action discriminating unit 2 increases the judgment accuracy based on the operation information from the tool because it is often difficult to judge the action of the person to be estimated from the video alone. For example, FIG. 6A is a scene of an image showing a situation where “person A” “screws” in region “X1Y1, X1Y2” with “driver 101a”. Similarly, FIG. 6B is a scene of an image showing a situation where “person A” “drills” in the area “X2Y1, X2Y2” in “cone 101b”. Similarly, FIG. 6C is a scene of an image showing the situation where “person B” “looses the screw” in the area “X2Y1, X2Y2” with “driver 101a”. These are difficult to determine because they are similar images even when a person visually determines an action. In addition, it is difficult to accurately discriminate these similar behaviors in action determination by image processing, and furthermore, the tool that is the basis for important action determination is hidden in the shadows of people and can always be confirmed from the image. difficult.

例えば、人Ａが監視領域全体のおよそ左下半分の領域においてドライバによりねじをしめる作業を行っているとし、本実施例では、まず、その状況に対する映像を映像取得部１０が、その状況に対する工具データを工具データ取得部２０が取得し、それぞれ人行動判別部１５及び工具動作判別部２５がそれぞれ学習結果蓄積部のモデルを用いて判別する。ここで、人行動定義は図３に、工具動作定義は図４のようであったとする。 For example, it is assumed that the person A is performing an operation of screwing with a driver in the lower left half region of the entire monitoring region. In this embodiment, first, the video acquisition unit 10 displays a video image of the situation in the tool data for the situation. Is acquired by the tool data acquisition unit 20, and the human behavior determination unit 15 and the tool motion determination unit 25 respectively determine using the model of the learning result storage unit. Here, it is assumed that the human action definition is as shown in FIG. 3 and the tool motion definition is as shown in FIG.

このとき、人行動判別部１５は、「人Ａ」が領域「X1Y1, X1Y2」において「ねじをしめる」という人行動候補４０１と、「人Ａ」が領域「X2Y1, X2Y2」において「ねじをしめる」という人行動候補４０２と、「人Ｂ」が領域「X2Y1, X2Y2」において「穴をあける」という人行動候補４０３と、「人Ａ」が領域「X2Y1, X2Y2」において「ねじをゆるめる」という人行動候補４０５とを人行動候補として出力する可能性がある。 At this time, the human action discriminating unit 15 causes the human action candidate 401 that “person A” “screws” in the area “X1Y1, X1Y2” and “person A” “screws" in the area “X2Y1, X2Y2”. "Personal action candidate 402", "Person B" says "Perform a hole" in the area "X2Y1, X2Y2", and "Person A" say "Unscrew" in the area "X2Y1, X2Y2" The human action candidate 405 may be output as a human action candidate.

なお、人は動きながら行動したり、あるいは作業位置が人行動定義における監視領域の区分を跨っていたりする場合には、一連の作業であったとしても人行動判別部１５は複数の人行動候補を出力する可能性がある。例えば、上述の例であれば人行動候補４０１と人行動候補４０２の双方を出力する。このような場合には、連続動作である場合には、いずれか一方（例えば、候補を推定したときの領域）を出力するようにしてもかまわない。 In addition, when a person moves while moving or the work position crosses the section of the monitoring area in the human action definition, the human action discriminating unit 15 may select a plurality of human action candidates even if it is a series of work. May be output. For example, in the above example, both the human action candidate 401 and the human action candidate 402 are output. In such a case, in the case of continuous operation, either one (for example, a region when a candidate is estimated) may be output.

一方、工具動作判別部２５は、「ドライバ」が領域「X2Y1」において「ねじをしめる」という動作候補５０２と、「ドライバ」が領域「X2Y2」において「ねじをしめる」という動作候補５０３とを出力する可能性がある。なお、工具の位置情報にずれが生じたりや工具の位置が工具動作定義における監視領域の区分を跨っていたりする場合には、一連の作業であったとしても工具動作判別部２５は複数の工具動作候補を出力する可能性がある。このような場合には、人行動候補と同様に、連続動作である場合には、いずれか一方（例えば、候補を推定したときの領域）を出力するようにしてもかまわない。 On the other hand, the tool motion determination unit 25 outputs an operation candidate 502 that “driver” tightens a screw in the region “X2Y1” and an operation candidate 503 that “driver” tightens a screw in the region “X2Y2”. there's a possibility that. Note that if the tool position information is misaligned or the position of the tool straddles the section of the monitoring area in the tool motion definition, the tool motion determination unit 25 may use a plurality of tools even if it is a series of operations. There is a possibility of outputting motion candidates. In such a case, as in the case of the human action candidate, in the case of continuous motion, either one (for example, the region when the candidate is estimated) may be output.

総合人行動判別部２では、まず、人行動判別部１５で出力した人行動候補と工具動作判別部２５の出力した工具動作候補とを照合し、矛盾するものを除く。例えば、「穴をあける」人行動候補４０３、「ねじをゆるめる」人行動候補４０５は出力された工具動作候補から排除される。さらに、図７のように、人行動判別部１５が推定した人行動候補と、工具動作判別部２５が推定した工具動作候補とのマッピングを行い、その重なりに基づき人行動推定候補として出力する。この場合、「人Ａ」が領域「X2Y1, X2Y2」において、「ドライバ１０１ａ」で「ねじをしめる」を人行動推定候補とする。なお、総合人行動判別部２においても、図７のようなマッピング状況をディープラーニングなどの機械学習により分類問題として学習し、人行動推定するようにしてもよい。 First, the comprehensive human action discriminating unit 2 collates the human action candidates output by the human action discriminating unit 15 with the tool motion candidates output by the tool operation discriminating unit 25 and removes contradictions. For example, the human action candidate 403 “punching” and the human action candidate 405 “loosen screw” are excluded from the output tool motion candidates. Further, as shown in FIG. 7, the human behavior candidate estimated by the human behavior determination unit 15 and the tool motion candidate estimated by the tool motion determination unit 25 are mapped and output as a human behavior estimation candidate based on the overlap. In this case, in the region “X2Y1, X2Y2”, “person A” uses “driver 101a” to “tighten” as a human action estimation candidate. Note that the general human action discriminating unit 2 may also learn a mapping situation as shown in FIG. 7 as a classification problem by machine learning such as deep learning, and estimate human actions.

また、本実施例では工具の使用領域を定義しているため、例えば、工具動作定義が図４のようであれば、錐１０１ｂは領域「X2Y2」、「X3Y1」での使用のみが定義されているため、工具動作判別部２５または総合人行動判別部２により、領域「X2Y2」、「X3Y1」以外での使用状況が取得できれば警告を発することが可能となり、工具の誤使用や誤作業防止を図ることも可能である。具体的には、工具動作判別部２５は、該当する道具動作定義がないと判別する場合、あるいは総合人行動判別部２が該当する人行動定義がないと推定する場合に警告を発する。 In addition, since the tool use area is defined in this embodiment, for example, if the tool motion definition is as shown in FIG. 4, the cone 101b is defined only for use in the areas “X2Y2” and “X3Y1”. Therefore, it is possible to issue a warning if the tool movement discriminating unit 25 or the comprehensive human behavior discriminating unit 2 can acquire the usage status outside the areas “X2Y2” and “X3Y1”, thereby preventing erroneous use of the tool and misoperation. It is also possible to plan. Specifically, the tool movement determination unit 25 issues a warning when it is determined that there is no corresponding tool movement definition, or when the general human action determination unit 2 estimates that there is no corresponding human action definition.

以上、本発明を製造ラインにおける監視システムに適用した場合を例に説明したが、記載の実施例に限定されるものではなく、様々な変形例が含まれる。例えば、工具は行動推定対象の人が使用する一般的な道具やものに拡張可能である。上述の実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 As described above, the case where the present invention is applied to the monitoring system in the production line has been described as an example. However, the present invention is not limited to the described embodiment, and includes various modifications. For example, the tool can be extended to a general tool or thing used by a person whose behavior is to be estimated. The above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

１：人行動推定システム、２：総合人行動判別部、１０：映像取得部、１１：映像情報蓄積部、１２：人行動定義、１３：人行動学習部、１４：人行動学習結果蓄積部、１５：人行動判別部、２０：工具データ取得部、２１：工具データ蓄積部、２２：工具動作定義、２３：工具動作学習部、２４：工具動作学習結果蓄積部、２５：工具動作判別部、１０１：工具、１０３：センサ、２００：カメラ装置。 1: human behavior estimation system, 2: comprehensive human behavior determination unit, 10: video acquisition unit, 11: video information storage unit, 12: human behavior definition, 13: human behavior learning unit, 14: human behavior learning result storage unit, 15: Human action determination unit, 20: Tool data acquisition unit, 21: Tool data storage unit, 22: Tool motion definition, 23: Tool motion learning unit, 24: Tool motion learning result storage unit, 25: Tool motion determination unit, 101: Tool, 103: Sensor, 200: Camera device.

Claims

A human behavior estimation system that discriminates a human action using a tool,
A video acquisition unit that acquires a video of the behavior;
Based on the video from the video acquisition unit, from a predetermined human behavior definition, a human behavior determination unit that outputs human behavior candidates for the behavior photographed in the video,
A tool data acquisition unit for acquiring sensor information from a sensor attached to the tool;
Based on the sensor information from the tool data acquisition unit, a tool operation determination unit that outputs a tool operation candidate for the tool from which the sensor information has been acquired from a predetermined tool operation definition;
Based on the human action candidate output from the human action determination unit and the tool action candidate output from the tool action determination unit, a comprehensive human action that estimates the action taken in the video from the video acquisition unit A human behavior estimation system having a discrimination unit.

In claim 1,
The human action definition includes first area information indicating an area where the action is performed,
The tool action definition includes second area information indicating an area where the tool is used,
The comprehensive human action determination unit performs mapping based on the first area information of the human action candidate and the second area information of the tool action candidate, thereby performing the action taken on the video from the video acquisition unit. Human behavior estimation system that estimates

In claim 2,
The first area information is a human behavior estimation system defined by dividing a video space of the video.

In claim 2,
When the tool motion determination unit determines that there is no corresponding tool motion definition for the tool from which the sensor information is acquired, or the comprehensive human behavior determination unit responds to the action photographed in the video A human behavior estimation system that issues a warning when it is estimated that there is no corresponding human behavior definition.

In claim 1,
The human behavior determination unit is configured to detect the behavior captured in the video using a learning model obtained by machine learning using learning data in which the human behavior definition and the video from the video acquisition unit are associated with each other. A human behavior estimation system for outputting all human behavior candidates.

In claim 1,
The tool motion determination unit has acquired the sensor information using a learning model obtained by machine learning using learning data in which the tool motion definition and the sensor information from the tool data acquisition unit are associated with each other. A human behavior estimation system that outputs the tool motion candidates for the tool.