JP2005250708A

JP2005250708A - Tool motion recognition device and tool motion recognition method

Info

Publication number: JP2005250708A
Application number: JP2004058301A
Authority: JP
Inventors: Hidetomo Sakaino; 英朋境野; Yutaka Yanagisawa; 豊柳沢; Tetsuji Sato; 哲司佐藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-03-03
Filing date: 2004-03-03
Publication date: 2005-09-15
Anticipated expiration: 2024-03-03
Also published as: JP4102318B2

Abstract

【課題】人が各種道具を手で操作している動作シーンからどの道具を用いているかを，オクルージョンと不連続性が存在する状況下で高精度に認識することを可能とする。
【解決手段】画像入力部１１が，人が各種道具を操作している動作シーンのフレームを入力し，画像蓄積部１２が，入力されたフレームを蓄積する。速度推定部１３が，動作シーンから速度ベクトルを推定し，シンボル時系列生成部１４が，推定された速度ベクトルからシンボル変換テーブル１４１によってシンボル時系列を生成する。学習部１５は，生成されたシンボル時系列を用いてＨＭＭモデルパラメータを計算して学習する。認識部１６は，新たに入力された動作シーンについて，速度推定部１３が推定した速度ベクトルをもとに，学習部１５による学習結果を用いてどの道具操作を行っているかを識別し，出力部１７が，認識部１６による識別結果を出力する。
【選択図】図１An object of the present invention is to recognize which tool is used from an operation scene in which a person operates various tools by hand under a situation where occlusion and discontinuity exist.
An image input unit 11 inputs a frame of an operation scene in which a person operates various tools, and an image storage unit 12 stores the input frame. The speed estimation unit 13 estimates a speed vector from the operation scene, and the symbol time series generation unit 14 generates a symbol time series from the estimated speed vector using the symbol conversion table 141. The learning unit 15 calculates and learns the HMM model parameters using the generated symbol time series. The recognition unit 16 identifies which tool operation is being performed using the learning result of the learning unit 15 based on the speed vector estimated by the speed estimation unit 13 for the newly input motion scene, 17 outputs the identification result by the recognition unit 16.
[Selection] Figure 1

Description

本発明は，道具動作認識技術に関し，特に，ＷＥＢカメラや一般のカメラから入力された時系列シーンにおいて，道具に関する形状知識を用いることなく，人がどのような道具を操作したかをパターン学習・認識する道具動作認識装置および道具動作認識方法に関するものである。 The present invention relates to tool motion recognition technology, and more particularly to pattern learning / processing of a tool operated by a person without using shape knowledge about the tool in a time-series scene input from a WEB camera or a general camera. The present invention relates to a tool motion recognition device and a tool motion recognition method.

ＷＥＢやさまざまな観測源から得られる画像化されたカメラシーンからの人のジェスチャーや動作を認識する研究が活発に行われている。また，カメラによる遠隔地，多地点でのサーベイランス（監視）システム，ネットワークの構築が進んできており，複雑な環境下，複雑な動作まで認識できる技術が望まれる。しかし，制約条件のない自然な動作認識の研究については，従来ほとんど行われていない。従来のジェスチャー研究の例で言えば，予め動作と対応するコマンド体系を限定させていることから自然さに欠けている。 Research is being actively conducted to recognize human gestures and movements from imaged camera scenes obtained from WEB and various observation sources. In addition, the construction of surveillance systems and networks at remote locations using cameras and multipoints is progressing, and technology that can recognize even complex operations in complex environments is desired. However, there has been little research on natural motion recognition without constraints. In the example of conventional gesture research, it lacks naturalness because the command system corresponding to the motion is limited in advance.

道具の知識を用いる場合，例えば身の回りには鉛筆，消しゴムがあり，書く，消すといったさまざまな動作が存在し，道具自体の多様性と動作方法で組み合わせ数は膨大なものとなってしまう。人間が遠方にいる人の動作を認識する場合，すべての組み合わせを学習して遠方の動作を認識しているとは考えにくい。また，基本動作には（材料を）切る，（部屋を）掃く，（机を）拭くなど，対象と道具に応じた多くの種類が存在するが，我々人間は，手や腕のオクルージョンにより，道具そのものが完全に見えなくても容易に認識できる。 When using the knowledge of tools, for example, there are pencils and erasers around us, and there are various actions such as writing and erasing, and the number of combinations becomes enormous due to the variety of tools themselves and the way they are operated. When a human being recognizes the movement of a person who is far away, it is unlikely that all combinations are learned to recognize the movement of a far person. In addition, there are many types of basic movements, such as cutting (material), sweeping (room), wiping (desk), etc., depending on the object and tool, but we humans, by occlusion of hands and arms, Even if the tool itself is not completely visible, it can be easily recognized.

しかし，例えばロボットの目からの認識は容易なことではない。これはロボットが実環境変化や照明変化のもとで動作を識別するには，オクルージョン，道具に対する学習，動作の学習を十分に行う必要があるためである。特に，ペンで字を書いているときと，ドライバーでネジを回しているときのように，類似した構図の場合，一層その認識が困難となる。このような認識技術が確立すれば，ロボットが視覚的な状況判断から人をサポートするなどのタスクを円滑に行えるようになるであろう。 However, for example, recognition from the robot's eyes is not easy. This is because it is necessary for the robot to sufficiently learn occlusion, tools, and movements in order to identify movements under actual environment changes and lighting changes. In particular, when writing a character with a pen and when turning a screw with a screwdriver, the recognition becomes even more difficult. If such a recognition technology is established, the robot will be able to smoothly perform tasks such as supporting people from visual judgment.

また，遠隔監視によって人の動作を詳細に自動認識することで，不自然な動作認識や道具を使った犯罪の予見など，幅広い応用が考えられる。このような背景があるため，人の身振り，手振り，といったジェスチャーの認識や道具を用いた動作認識が重要なテーマとして盛んに研究されている。 In addition, by automatically recognizing human actions in detail through remote monitoring, a wide range of applications such as unnatural movement recognition and crime prediction using tools can be considered. Because of this background, recognition of gestures such as human gestures and hand gestures and motion recognition using tools have been actively studied as important themes.

Starner 等は，４０種類のＡＳＬハンドジェスチャーを，カラーグローブあるいは肌色特徴量に基づいてＨＭＭ（Hidden Morkov Model ）により認識することを提案している（非特許文献１参照）。 Starner et al. Have proposed that 40 types of ASL hand gestures are recognized by HMM (Hidden Morkov Model) based on color glove or skin color feature (see Non-Patent Document 1).

Fels等は，データグローブを使用し，そこからセンシングされる手の動き情報に基づいて，マルチレイヤーＮＮ（Neural Network）で学習することを行った（非特許文献２参照）。 Fels et al. Used a data glove and learned by a multi-layer NN (Neural Network) based on hand movement information sensed therefrom (see Non-Patent Document 2).

Siskind 等は，人の動作知覚では物体の認識ではなく，物体とは別に視覚的な軌跡によると仮定して実験を行った（非特許文献３参照）。実験には，カラーモデルとＨＭＭ（Hidden Markov Model ）を，pick up ，put down，push，pull，drop，throw の６つの動きの解析に利用した。 Siskind et al. Conducted an experiment on the assumption that human motion perception was not an object recognition but a visual trajectory separate from the object (see Non-Patent Document 3). In the experiment, a color model and an HMM (Hidden Markov Model) were used to analyze six movements of pick up, put down, push, pull, drop, and throw.

Bobick等は，マグネティックセンサー付データグローブを使って手の軌跡の主曲率や手の固有画像に基づいて，ＤＴＷ（Dynamic Time Warping）による認識を行った（非特許文献４参照）。彼等はＨＭＭによる軌跡の学習と認識へ拡張した（非特許文献５参照）。 Bobick et al. Recognized DTW (Dynamic Time Warping) based on the main curvature of the hand trajectory and the hand's unique image using a data glove with a magnetic sensor (see Non-Patent Document 4). They extended to learning and recognition of trajectories by HMM (see Non-Patent Document 5).

Lee 等は，手の肌色を用いて１０種類のジェスチャーをＨＭＭにより学習・認識する方法をstart ，first ，nextなどプレゼンテーションで用いる基本コマンドに応用した（非特許文献６参照）。 Lee et al. Applied a method of learning and recognizing 10 types of gestures using HMM using hand skin color to basic commands used in presentations such as start, first, and next (see Non-Patent Document 6).

Yang等は，汎用性を高めるため，データグローブを使用せずに肌色，モーションセグメンテーションに基づいて抽出した手の軌跡を生成して，ＴＤＮＮ（Time Delayed Neural Network ）で学習・認識する方法を提案した（非特許文献７参照）。４０種類のＡＳＬジェスチャーで実験を行い，高い認識率を得ている。 Yang et al. Proposed a method for learning and recognizing TDNN (Time Delayed Neural Network) by generating hand trajectories extracted based on skin color and motion segmentation without using data gloves in order to improve versatility. (Refer nonpatent literature 7). Experiments with 40 types of ASL gestures have resulted in high recognition rates.

以上述べた方法では，道具を持った場合の操作認識には，オクルージョン問題のために適用が困難であることと，背景差分による方法では実環境変化や照明変化には対応できないという問題がある。 In the method described above, there are problems that the operation recognition with the tool is difficult to apply due to the occlusion problem, and that the method based on the background difference cannot cope with the actual environment change and the illumination change.

照明変化問題に対してBobick等は，腕の振りなどの動画から差分画像の累積画像を作成し，動作認識する方法を提案した（非特許文献８参照）。エアロビクスのように姿勢が大きく変化する場合には，高速に認識できることが示されている。しかし，後述する本発明で扱うような一定範囲内での，ねじ回しとキリのような細かい動作では，類似度の高い画像となってしまうため，認識が困難となる。 For the lighting change problem, Bobick et al. Proposed a method for recognizing motion by creating a cumulative image of difference images from moving images such as arm swings (see Non-Patent Document 8). It has been shown that it can be recognized at high speed when the posture changes greatly as in aerobics. However, in a detailed operation such as screwing and drilling within a certain range as will be described later in the present invention, an image with a high degree of similarity is obtained, and recognition is difficult.

Yamato等は，道具を持った場合の動作認識の例として，テニスプレー時の４つのフォーム（ボレーなど）の識別にＨＭＭの適用を提案している（非特許文献９参照）。入力画像ごとに背景差分を計算し，シルエット画像生成，２値化，ベクトル量子化のあと，３６シンボルパターンで学習している。認識には，プレーヤーの手足を含めた姿勢とラケットの位置などが寄与したことが考えられる。動き特徴量は用いられていない。この手法では，シルエット画像を生成しているため，太陽光や環境変化に対しての影響を受けやすい問題が残されている。最も重要な認識率を大きく左右する画像からシンボルへの対応づけについては，経験と代表パターン選択に多くの時間を必要とすることも問題である。 Yamato et al. Have proposed the application of HMM to identify four forms (volley, etc.) during tennis play as an example of motion recognition when holding a tool (see Non-Patent Document 9). Background differences are calculated for each input image, and learning is performed with 36 symbol patterns after silhouette image generation, binarization, and vector quantization. It is conceivable that the posture including the player's limbs and the position of the racket contributed to the recognition. The motion feature quantity is not used. Since this method generates silhouette images, there remains a problem that is easily affected by sunlight and environmental changes. Associating images with symbols, which have the greatest effect on the recognition rate, requires a lot of time for experience and representative pattern selection.

Duric 等は，大工道具を手で操作しているときの，カメラ視線からみた道具の機能解析を試みている（非特許文献１０参照）。この非特許文献１０に記載された手法では，手に持った４種類の大工道具（シャベル，モンキー，ナイフ，スパナー）を取り上げて，その輪郭線の動きフローから，単眼カメラから擬似３次元オプティカルフローを推定した。ノーマルフローからモーションパラメータを求めて，各道具の動きを解析した。同じ道具であっても，モンキーを本来のねじ回しの用途からトンカチとして叩いたり，包丁も前後に切るだけでなく，突っつくなど，そのパラメータは時系列的に異なった推移を示す。この非特許文献１０では，各道具の機能性を明らかにし，分類しているが，手や腕の動きの影響まではモデルに考慮していなかった。また，時系列パターンの認識実験までは行っていなかった。 Duric et al. Try to analyze the function of the tool as seen from the camera line of sight when the carpenter tool is operated by hand (see Non-Patent Document 10). In the method described in this Non-Patent Document 10, four types of carpenter tools (shovel, monkey, knife, spanner) held in the hand are picked up, and from the motion flow of the contour line, the pseudo three-dimensional optical flow from the monocular camera. Estimated. The motion parameters were obtained from the normal flow and the motion of each tool was analyzed. Even with the same tool, the parameters show different transitions in time series, such as tapping the monkey as a tonchi from the original use of the screwdriver and cutting the knife back and forth. In this Non-Patent Document 10, the functionality of each tool is clarified and classified, but the effects of hand and arm movements are not considered in the model. Also, we did not conduct any time series pattern recognition experiments.

なお，本発明の実施の形態に関係するオプティカルフローに関する技術については，下記の非特許文献１１〜非特許文献１４に記載されている。
TE.Starner,J.Weaver,and A.Pentland,"Rea1-time american signlanguage recognition using desk and wearable computer based video",IEEE Trans. PAMI,vo1.20,no.12,pp.1371-1375,1998. S.S.Fels and G.E.Hinton,"Glove-talk:a neural network interface which maps gestures to parallel format speech synthesizer controls",IEEE Trans.Neural Network,vol.9,no.1,pp.205-212,1997. J.M.Siskind and Q.Morris,"A maximum-likelihood approach to visual event classification",Proc.Fourth European Conf.Computer Vision,pp.347-360,1996. A.F.Bobick and A.D.Wilson,"A state-based approach to the representation and recognition of gesture",IEEE Trans. PAMI,vol.19,no.12,pp.1325-1337,1997. A.D.Wilson and A.F Bobick,"Parametric Hidden Markov Models for gesture recognition",IEEE Trans. PAMI,vol.21,no.9,pp.884-900,1999. H.K.Lee and J.H.Kim,"An HMM-based threshold model approach for gesture recognition",IEEE Trans. PAMI, vol.21,no.10,pp,961-973,1999. M.H.Yang,N.Ahuja,and M.Tabb,"Extraction of 2D motion trajectories and its application to hand gesture recognition",IEEE Trans. PAMI,vol.24,no.8,pp.1061-1074,2002. A.F.Bobick and J.W.Davis,"The Recognition of Human Movement Using Temporal Templates",IEEE Trans. PAMI,vol.23,no.3,pp.257-267,2001. J.Yamato,J.Ohya,and K.Ishii,"Recognizing human action in time-sequential images using Hidden Markov Mode1",Proc.Computer Vision and Pattern Recognition,pp.379-385,1992. Z.Duric,J.A.Fayman,and E.Rivlin,"Function from motion",IEEE PAMI,vol.18,no.6,pp.579-591,1996. B.D.Lucas and T.Kanade,"An iterative image registration technique with an application in stereo vision",IJCAI-81,pp.674-679. A.Bab-Hadiashar and D.Suter,"Robust optic flow computation,International Journal of Computer Vision,29,1,pp.59-77,1998. E.P.Ong and M.Spann,"Robust optical flow computation based on least-median-of-squares regression",International Journal of Computer Vision,31,1,pp.51-82,1999. N.Cornelius and T.Kanade,"Adapting optical flow to measure object motion in reflectance and X-ray image sequences",ACM SIGGRAPH/SIGART Interdisciplinary Workshop on Motion; Representation and Perception,Toronto,Canada,1983. In addition, about the technique regarding the optical flow relevant to embodiment of this invention, it describes in the following nonpatent literature 11-nonpatent literature 14.
TE.Starner, J. Weaver, and A. Pentland, "Rea1-time american signlanguage recognition using desk and wearable computer based video", IEEE Trans. PAMI, vo1.20, no.12, pp.1371-1375, 1998. SSFels and GE Hinton, "Glove-talk: a neural network interface which maps gestures to parallel format speech synthesizer controls", IEEE Trans.Neural Network, vol.9, no.1, pp.205-212, 1997. JMSiskind and Q. Morris, "A maximum-likelihood approach to visual event classification", Proc. Fourth European Conf. Computer Vision, pp. 347-360, 1996. AFBobick and AD Wilson, "A state-based approach to the representation and recognition of gesture", IEEE Trans. PAMI, vol. 19, no. 12, pp. 1325-1337, 1997. ADWilson and AF Bobick, "Parametric Hidden Markov Models for gesture recognition", IEEE Trans. PAMI, vol. 21, no. 9, pp. 884-900, 1999. HKLee and JHKim, "An HMM-based threshold model approach for gesture recognition", IEEE Trans. PAMI, vol.21, no.10, pp, 961-973, 1999. MHYang, N. Ahuja, and M. Tabb, "Extraction of 2D motion trajectories and its application to hand gesture recognition", IEEE Trans. PAMI, vol. 24, no. 8, pp. 1061-1074, 2002. AFBobick and JWDavis, "The Recognition of Human Movement Using Temporal Templates", IEEE Trans. PAMI, vol. 23, no. 3, pp. 257-267, 2001. J. Yamato, J. Ohya, and K. Ishii, "Recognizing human action in time-sequential images using Hidden Markov Mode1", Proc. Computer Vision and Pattern Recognition, pp. 379-385, 1992. Z. Duric, JAFayman, and E. Rivlin, "Function from motion", IEEE PAMI, vol. 18, no. 6, pp. 579-591, 1996. BDLucas and T. Kanade, "An iterative image registration technique with an application in stereo vision", IJCAI-81, pp.674-679. A. Bab-Hadiashar and D. Suter, "Robust optic flow computation, International Journal of Computer Vision, 29, 1, pp. 59-77, 1998. EPOng and M. Spann, "Robust optical flow computation based on least-median-of-squares regression", International Journal of Computer Vision, 31, 1, pp. 51-82, 1999. N. Cornelius and T. Kanade, "Adapting optical flow to measure object motion in reflectance and X-ray image sequences", ACM SIGGRAPH / SIGART Interdisciplinary Workshop on Motion; Representation and Perception, Toronto, Canada, 1983.

上記非特許文献１〜１０に記載された従来技術は，以下のような問題がある。
（１）道具を手に持たないジェスチャー研究が大半であり，データグローブの使用，指などにカラーマーキングする不自然さがある。
（２）ドライバーを手先で細かく操作するような場合の認識に関しては十分な研究がなされていない。即ち，手や腕による道具へのオクルージョンや不連続性問題での学習・認識の検討が不十分である。
（３）画像パターンを用いたＨＭＭにおいては，入力シンボル数と出力シンボル数と認識率については十分にその性能評価がなされていない。
（４）画像パターンからの出力シンボル変換方法がほとんど研究されていない。
（５）動画からの動き推定では，ノイズや照明変動に耐性のあるオプティカルフロー法が適用されていない。 The conventional techniques described in Non-Patent Documents 1 to 10 have the following problems.
(1) Most of the researches are gestures that do not have tools in hand, and use of data gloves and color marking on fingers are unnatural.
(2) Sufficient research has not been done on the recognition when the driver is finely operated with the hand. In other words, the study of occlusion and discontinuity problems with tools using hands and arms is insufficient.
(3) In the HMM using an image pattern, the performance evaluation is not sufficiently performed for the number of input symbols, the number of output symbols, and the recognition rate.
(4) Little research has been done on the output symbol conversion method from the image pattern.
(5) In motion estimation from a moving image, an optical flow method resistant to noise and illumination fluctuation is not applied.

上記従来技術の問題点について，さらに具体的に説明する。図１１は人が道具を手に持って操作しているときの画像を示している。図１１（Ａ）はドライバーを回す動作，図１１（Ｂ）はキリを回す動作，図１１（Ｃ）はトンカチを叩く動作，図１１（Ｄ）はノコギリを引く動作を示す。 The problems of the above prior art will be described more specifically. FIG. 11 shows an image when a person operates with a tool in his hand. FIG. 11A shows the operation of turning the driver, FIG. 11B shows the operation of turning the drill, FIG. 11C shows the operation of tapping the torch, and FIG. 11D shows the operation of pulling the saw.

一見すると，４つそれぞれの道具による動作識別は容易に思われるが，実は形状の類似性と動きの類似性があるために問題は複雑化している。まず，道具の形状については，例えば，予め各道具の形状を記憶してそれを参照しようとしても，手や腕により道具が隠蔽されるためにマッチングによる識別が困難となる。 At first glance, it seems easy to identify the motion by each of the four tools, but the problem is complicated by the fact that there is similarity in shape and similarity in motion. First, regarding the shape of a tool, for example, even if the shape of each tool is stored in advance and an attempt is made to refer to it, the tool is concealed by hands or arms, so that identification by matching becomes difficult.

また，動きから理解しようとすると，ドライバーを回す動作では，同じ操作者であってもきれいな周期成分が出ることはほとんどなく，ばらついてしまうため，動きの対応がとりにくくなる。さらに，キリとドライバーは，回転運動を中心とした類似した動きを示す。トンカチとノコギリは，前後の動きと手首あるいは切断対象を支点とする回転運動が混在するなどの類似性がある。その他，各道具はエッジがはっきりとしているため，エッジ付近でのオプティカルフローが不連続となり，推定精度が低下してしまう。 In addition, when trying to understand from the movement, the operation of turning the driver hardly produces a clean periodic component even for the same operator, and varies, so it is difficult to cope with the movement. Furthermore, drills and drivers show similar movements centered on rotational movement. Tonkachi and saw have similarities such as a mixture of forward and backward movement and rotational movement around the wrist or cutting object. In addition, since each tool has a clear edge, the optical flow near the edge becomes discontinuous, which reduces the estimation accuracy.

以上述べたように，従来，各操作の細かい動き（オプティカルフロー）をオクルージョンと不連続性が存在する状況下で，高精度に推定する技術は確立されていなかった。また，実環境シーンではノイズや輝度変動などの環境変化がある。 As described above, conventionally, there has not been established a technique for accurately estimating the fine movement (optical flow) of each operation in the presence of occlusion and discontinuity. In an actual environment scene, there are environmental changes such as noise and luminance fluctuations.

本発明は，上記従来技術の問題点を解決し，人が各種道具を手で操作している動作シーンからどの道具を用いているかを，オクルージョンと不連続性が存在する状況下で高精度に認識する道具動作認識技術を提供することを目的とする。 The present invention solves the above-mentioned problems of the prior art, and it is possible to determine which tool is used from an operation scene in which a person operates various tools by hand under a situation where occlusion and discontinuity exist. It aims at providing the tool movement recognition technology which recognizes.

本発明のうち，代表的なものの概要を以下に説明する。 An outline of typical ones of the present invention will be described below.

（１）本発明は，ＷＥＢやさまざまな観測源から得られる時系列画像（シーン）から，人が各種道具を使用しているときの動作を学習し，その学習結果を利用して認識する道具動作認識システムにおいて，画像入力部でさまざまな画像情報の時系列画像を入力し，画像蓄積部で蓄積し，速度推定部でシーン中の動きを示す速度ベクトルをオプティカルフローで推定し，学習部で各種道具操作シーンでＨＭＭモデルパラメータを計算し，新たなシーンを入力して認識部でどの道具操作を行っているかを分類，識別し，出力部で結果を提示する。 (1) The present invention is a tool that learns actions when a person is using various tools from time series images (scenes) obtained from WEB and various observation sources and recognizes them using the learning results. In a motion recognition system, a time-series image of various image information is input in an image input unit, stored in an image storage unit, a velocity vector indicating motion in a scene is estimated by an optical flow in a velocity estimation unit, and a learning unit The HMM model parameters are calculated in various tool operation scenes, a new scene is input, and the tool operation being performed in the recognition unit is classified and identified, and the result is presented in the output unit.

（２）また，本発明は，前記（１）の速度推定部において，オプティカルフローの枠組みで目的関数を設定し，速度成分，輝度変動成分を未知数として最小二乗法により推定する。 (2) Further, according to the present invention, in the speed estimation unit (1), an objective function is set in the optical flow framework, and the speed component and the luminance fluctuation component are estimated as unknowns by the least square method.

（３）また，本発明は，前記（２）の目的関数において，非線形なロバスト関数として，ローレンツ関数，バイ・ウエイト関数等を介して，非線形関数の最小化により，各未知数を推定する。 (3) Further, the present invention estimates each unknown by minimizing the non-linear function through the Lorentz function, the bi-weight function, etc. as the non-linear robust function in the objective function of (2).

（４）また，本発明は，前記（３）非線型関数の最小化演算には，最急降下法，ニューラルネット，レーベンバーグ・マッカート法等を用いる。 (4) In the present invention, the steepest descent method, neural network, Levenberg-McCart method, or the like is used for the (3) non-linear function minimization calculation.

（５）また，本発明は，前記（３）のローレンツ関数等に含まれる分散値は，最小化過程において，段階的に大きい値から小さい値へ可変させる。 (5) Further, in the present invention, the variance value included in the Lorentz function or the like of (3) is varied from a large value to a small value stepwise in the minimization process.

（６）また，本発明は，前記（１）の学習部において，予め設定した速度ベクトル情報をシンボルへ変換する変換対応図を用いて，速度推定部で推定された速度ベクトルを，ＨＭＭモデルパラメータの計算に必要される入力シンボルに変換し，学習に用いる。 (6) Further, according to the present invention, in the learning unit of (1), the velocity vector estimated by the velocity estimation unit is converted into an HMM model parameter using a conversion correspondence diagram for converting preset velocity vector information into a symbol. It is converted into input symbols required for the calculation of and used for learning.

（７）また，本発明は，前記（６）の学習部において，変換対応図には同心円状パターンを用いる。 (7) Further, according to the present invention, a concentric pattern is used for the conversion correspondence diagram in the learning unit of (6).

具体的には，本発明は，人が道具を使用しているときの動作をコンピュータによって認識する方法において，観測源から得られる時系列画像を入力し，この中の連続した動作シーンのフレームからオプティカルフロー法により動作ベクトルを推定し，乱れのあるベクトルを除去して平均速度ベクトルを求める。この速度ベクトルを例えば同心状の変換対応図にあてはめて，認識させたい道具ごとにＨＭＭで学習させる。学習後に新たに未知のシーンを入力し，ＨＭＭで計算を行い，最も尤度が小さい動作を認識結果とする。 Specifically, according to the present invention, in a method of recognizing a motion when a person is using a tool by a computer, a time-series image obtained from an observation source is input, and a frame of continuous motion scenes in the time-series image is input. The motion vector is estimated by the optical flow method, and the average velocity vector is obtained by removing the disturbed vector. This velocity vector is applied to, for example, a concentric conversion correspondence diagram, and each tool to be recognized is learned by the HMM. After learning, a new unknown scene is input, calculation is performed by the HMM, and the operation with the smallest likelihood is taken as the recognition result.

本発明によれば，人が各種道具を手で操作している動作シーンからどの道具を用いているかを，オクルージョンと不連続性が存在する状況下で高精度に認識することが可能となる。 According to the present invention, it is possible to accurately recognize which tool is being used from an operation scene in which a person is operating various tools by hand under a situation where occlusion and discontinuity exist.

以下，本発明の実施の形態について詳細に説明する。図１は，本発明の処理全体を示すシステム構成図である。道具動作認識装置１において，１１は人が各種道具を手で操作している動作シーンの時系列画像（フレーム）を入力する画像入力部，１２は入力されたフレームを蓄積する画像蓄積部，１３は動作シーンから速度ベクトルを推定する速度推定部，１４は速度ベクトルからシンボル時系列を生成するシンボル時系列生成部，１５は生成されたシンボル時系列から道具ごとのＨＭＭモデルパラメータを計算して学習する学習部，１６は新たに入力された動作シーンについて，学習部１５の学習結果に基づいて，どの道具操作を行っているかを認識する認識部，１７は認識部１６による認識結果を出力する出力部，１００は各道具毎のＨＭＭモデルパラメータが格納される動作データベース（ＤＢ），１４１は速度ベクトルをシンボルに変換する変換対応図の情報をテーブル化して記憶するシンボル変換テーブルである。 Hereinafter, embodiments of the present invention will be described in detail. FIG. 1 is a system configuration diagram showing the entire processing of the present invention. In the tool motion recognition apparatus 1, 11 is an image input unit for inputting time-series images (frames) of an operation scene in which a person is operating various tools by hand, 12 is an image storage unit for storing input frames, 13 Is a speed estimation unit that estimates a speed vector from the motion scene, 14 is a symbol time series generation unit that generates a symbol time series from the speed vector, and 15 is a learning method that calculates HMM model parameters for each tool from the generated symbol time series. A learning unit 16 for recognizing which tool operation is being performed based on a learning result of the learning unit 15 for a newly input motion scene, and 17 an output for outputting a recognition result by the recognition unit 16 , 100 is an operation database (DB) in which HMM model parameters for each tool are stored, and 141 is a variable for converting a velocity vector into a symbol. A symbol conversion table which stores information of a corresponding view a table to.

画像入力部１１から，人の道具を手で持って動作しているときの，動作が既知な動作シーンを，ＷＥＢカメラや一般ビデオカメラからの情報源として入力し，入力した動作シーンの時系列画像を，画像蓄積部１２で蓄積する。速度推定部１３では，画像蓄積部１２で蓄積した時系列画像の動作シーンから速度ベクトルを推定し，シンボル時系列生成部１４で速度ベクトルをシンボル変換テーブル１４１を用いてシンボルに変換し，学習部１５で変換後のシンボルからＨＭＭモデルパラメータを計算して，ＨＭＭで学習する。以上のように，認識させたい道具ごとにＨＭＭで学習を行って，道具の種類の数と同じＨＭＭを作り，動作ＤＢ１００に格納する。 A motion scene of known motion when a human tool is held by hand is input from the image input unit 11 as an information source from a WEB camera or a general video camera. The image is stored in the image storage unit 12. The speed estimation unit 13 estimates a speed vector from the operation scene of the time-series image stored in the image storage unit 12, the symbol time-series generation unit 14 converts the speed vector into a symbol using the symbol conversion table 141, and the learning unit In step 15, HMM model parameters are calculated from the converted symbols, and learning is performed using the HMM. As described above, learning is performed with the HMM for each tool to be recognized, and the same HMM as the number of tool types is created and stored in the operation DB 100.

認識部１６で新たに入力された動作が未知のシーンについて，予め学習された動作ＤＢ１００中のＨＭＭにシンボル時系列を入力して，最も尤度が小さいＨＭＭに対応する動作を，入力されたシーンの動作として認識し，出力部１７で，どの動作かを結果として出力する。 For a scene whose motion newly input by the recognition unit 16 is unknown, the symbol time series is input to the previously learned HMM in the motion DB 100, and the motion corresponding to the HMM having the lowest likelihood is input. The output unit 17 outputs the result as a result.

前述したように，各操作の細かい動き（オプティカルフロー）をオクルージョンと不連続性が存在する状況下で，高精度に推定する必要があり，また，実環境シーンにおけるノイズや輝度変動などの環境変化に対応する必要がある。そこで，本発明では，速度ベクトルの推定のため，輝度変動モデルとロバスト推定法に基づいたオプティカルフロー法を導入する。 As described above, it is necessary to estimate the detailed movement (optical flow) of each operation with high accuracy in the presence of occlusion and discontinuity, and the environment changes such as noise and luminance fluctuations in the actual environment scene. It is necessary to cope with. Therefore, in the present invention, an optical flow method based on a luminance variation model and a robust estimation method is introduced to estimate a velocity vector.

また，動作の認識方法については，各道具を使い続けていると，ネジの締め具合，切断する物の切れ具合，穴あけの抵抗などの状況が時々刻々と変化していくため，明瞭に周期的な動きとはならないのが特徴である。このことから，時間軸の伸縮性に耐性があることで音声の分野で実績の高いＨＭＭによる時系列パターン識別方法を採用する。 As for the recognition method of movements, as each tool continues to be used, the conditions such as the tightening of screws, the degree of cutting of objects to be cut, the resistance to drilling, etc. change from moment to moment. The feature is that it doesn't move smoothly. For this reason, a time-series pattern identification method based on HMM, which has a proven record in the field of speech due to its resistance to time-axis elasticity, is adopted.

オプティカルフローに関する導出方法について，以下に述べる。オプティカルフローについては，これまで多くの方法が提案されてきている。その中で領域法と呼ばれる，上記非特許文献１１に記載されたLucas 等の方法が，その精度と安定性から広く適用されている。しかしながら，輝度変動一定モデルであるため，照明変化には応じていない。また，最小二乗法による解法をとっているために，外れ値や不連続成分の影響によって，精度が低下してしまう。 The derivation method for optical flow is described below. Many methods have been proposed for optical flow. Among them, a method such as Lucas described in Non-Patent Document 11 referred to as a region method is widely applied because of its accuracy and stability. However, since it is a model with constant luminance variation, it does not respond to illumination changes. In addition, since the solution is based on the method of least squares, the accuracy decreases due to the influence of outliers and discontinuous components.

そのため，上記非特許文献１２，非特許文献１３に記載されているようにロバスト推定法が適用され精度が改善されている。しかし，これらの非特許文献１２，非特許文献１３に記載された方法は，ともに輝度変動一定モデルであるため，フレーム間での輝度変動が大きい場合には，ロバスト推定法だけでは効果が薄くなる。 Therefore, as described in Non-Patent Document 12 and Non-Patent Document 13, a robust estimation method is applied to improve accuracy. However, since the methods described in Non-Patent Document 12 and Non-Patent Document 13 are both models with constant luminance fluctuations, if the luminance fluctuation between frames is large, the robust estimation method alone is less effective. .

そこで本発明では，Cornelius 等が上記非特許文献１４で提案したフレーム間で輝度が線形に変化することを許容するモデルに基づいて，ロバスト推定法による解法をとることとした。ロバスト関数は，非線形なローレンツ型を選択した。そのため，未知数推定は非線形最小二乗法問題となる。 Therefore, in the present invention, a solution based on a robust estimation method is adopted based on a model that allows the luminance to change linearly between frames proposed by Cornelius et al. The robust Lorentz type was selected for the robust function. Therefore, unknown estimation becomes a nonlinear least squares problem.

説明の簡単化のため，サンプリング時間を，１．０として，第ｎフレームにおける２次元の位置ベクトルをＸ，速度べクトルをＵ，強度値をＩ（Ｘ，ｎ）とすると，線形輝度変化モデル式は，
Ｉ（Ｘ＋Ｕ，ｎ＋１）＝Ｉ（Ｘ，ｎ）＋ｂ（Ｘ，ｎ）（１）
と記述される。位置ベクトルＸの成分を，Ｘ＝（ｘ，ｙ），速度ベクトルＵの成分を，Ｕ＝（ｕ，ｖ）とする。 For simplicity of explanation, a linear luminance change model is assumed where the sampling time is 1.0, the two-dimensional position vector in the nth frame is X, the velocity vector is U, and the intensity value is I (X, n). ceremony,
I (X + U, n + 1) = I (X, n) + b (X, n) (1)
Is described. The component of the position vector X is X = (x, y), and the component of the velocity vector U is U = (u, v).

続いて，式（１）について，ベクトルＸ周りでテイラー展開近似を施すと， Subsequently, when Taylor expansion approximation is performed around the vector X for equation (1),

となる。

It becomes.

２つの速度成分，係数について離散化した格子点上で扱う。時間方向には，ｎ分割し，空間的には，画像（窓）Ｍ×Ｎを分割幅，
ｈ_x＝１．０，ｈ_y＝１．０，
で分割して計算する。ここで，ｉ，ｊを整数として，０≦i≦Ｍ，０≦j≦Ｎとする。位置ベクトルは，
Ｘ_i,j ⁿ＝（ｉｈ_x，ｊｈ_y）ⁿ，
速度ベクトルは，
Ｕ_i,j ⁿ＝（ｕ_i,j，ｖ_i,j）ⁿ，
と離散表示される。各空間項や時間項などの１次微分項を求めるため，差分法により画素点上で離散化近似を次のように行う。画像強度，各係数ｂについても，時間ｎ，位置（ｉ，ｊ）で各画素上で表されるものとする。 The two velocity components and coefficients are handled on a discrete grid point. In the time direction, it is divided into n, and spatially, the image (window) M × N is divided into
h _x = 1.0, h _y = 1.0,
Divide by and calculate. Here, i and j are integers, and 0 ≦ i ≦ M and 0 ≦ j ≦ N. The position vector is
X _{i, j} ⁿ = (ih _x , jh _y ) ⁿ ,
The velocity vector is
U _{i, j} ⁿ = (u _{i, j} , v _{i, j} ) ⁿ ,
Are displayed discretely. In order to obtain the first derivative terms such as each space term and time term, the discretization approximation is performed on the pixel points by the difference method as follows. The image intensity and each coefficient b are also expressed on each pixel at time n and position (i, j).

式（２）の誤差を窓内で最小化するための目的関数を，
Ｅ＝Σρ（ｅｒｒ）（３）
と定義する。ここで非線形ロバスト関数をρとする。この式が最小値をもつための条件式として，３つの未知数についての１次微分がゼロとなればよい，即ち，

The objective function for minimizing the error in equation (2) within the window is
E = Σρ (err) (3)
It is defined as Here, the nonlinear robust function is represented by ρ. As a conditional expression for this expression to have a minimum value, the first derivative with respect to three unknowns should be zero, that is,

となるように，解を求めればよい。

Find the solution so that

ここでは最急降下法を適用して，３つの未知数を数〜十数画素四方（サブブロック）から，１画素ごとに１組ずつ推定する。下記の式（５）を３つの未知数ｕ，ｖ，ｂ（ｗと表す）について，反復計算すればよい。式（５）は，反復回数ｐ，調整パラメータμとすれば， Here, the steepest descent method is applied, and three unknowns are estimated from several to a dozen or more pixels (sub-block), one set for each pixel. The following equation (5) may be iteratively calculated for three unknowns u, v, b (represented as w). Equation (5) can be expressed by assuming the number of iterations p and the adjustment parameter μ:

で表される。調整パラメータμは経験的に決定される。

It is represented by The adjustment parameter μ is determined empirically.

式（５）で必要とされる３つの１次微分値については，chain-ruleに基づいて， The three first derivative values required in Equation (5) are based on chain-rule,

である。ただし，非線形ロバスト関数が１次導関数をもつとすれば，

It is. However, if the nonlinear robust function has a first derivative,

とおくと，それぞれの１次微分は，

The first derivative of each is

となる。

It becomes.

図２は，平均速度ベクトルの導出を説明する図であり，２枚の連続動作の映像を入力し，動きベクトルを検出し，平均速度ベクトルを求める過程を示している。図２（Ａ）に示すように，２枚の連続した動作シーンのフレームを入力し，図２（Ｂ）に示すように，オプティカルフロー法により動きベクトルを推定し，図２（Ｃ）に示すように，ＳＶＤ（Singular Value Decomposition：特異値分解）により乱れのあるベクトルを除去することにより，図２（Ｄ）に示すような平均速度ベクトルを求める。 FIG. 2 is a diagram for explaining the derivation of the average velocity vector, and shows a process of inputting two continuous motion images, detecting a motion vector, and obtaining an average velocity vector. As shown in FIG. 2 (A), two continuous motion scene frames are input, and as shown in FIG. 2 (B), a motion vector is estimated by the optical flow method, as shown in FIG. 2 (C). In this way, an average velocity vector as shown in FIG. 2D is obtained by removing a disturbed vector by SVD (Singular Value Decomposition).

図３は，２状態２５出力のＨＭＭの例を示している。ここでは，広く用いられている，left-to-right 型のＨＭＭを用いており，２状態１７出力と２５出力の２つのモデルを選択している。両者の性能の違いについては後述するように実験で示す。図３中，ｓ１，ｓ２は状態，ａ_ijは状態推移確率，ｂ_ijはシンボル出力確率を表す。 FIG. 3 shows an example of a 2-state 25-output HMM. Here, a left-to-right type HMM, which is widely used, is used, and two models of 2 states 17 outputs and 25 outputs are selected. The difference in performance between the two will be shown in experiments as will be described later. In FIG. 3, s1 and s2 are states, a _ij is a state transition probability, and b _ij is a symbol output probability.

図４は，平均速度ベクトルをシンボルに変換する変換対応図の例を示している。ここでは，矩形状と同心円状の２つの方式を示している。シンボル変換テーブル１４１は，この変換対応図に相当する平均速度ベクトルとシンボルとの対応情報を持つ。シンボル変換テーブル１４１を用いて，平均速度ベクトルをシンボルに変換し，操作時の動きパターンからシンボル時系列を生成し，ＨＭＭモデルパラメータ（例えば，非特許文献５参照）を道具ごとに計算し学習する。シンボル変換テーブル１４１は，認識のときにも同様に用いる。 FIG. 4 shows an example of a conversion correspondence diagram for converting an average velocity vector into a symbol. Here, two methods, rectangular and concentric, are shown. The symbol conversion table 141 has correspondence information between the average velocity vector and the symbol corresponding to this conversion correspondence diagram. Using the symbol conversion table 141, an average velocity vector is converted into a symbol, a symbol time series is generated from a motion pattern at the time of operation, and HMM model parameters (for example, see Non-Patent Document 5) are calculated and learned for each tool. . The symbol conversion table 141 is similarly used for recognition.

平均速度ベクトルからのシンボル変換方法については，これまで幾つかの方法が提案されているものの，どのような特徴量を選択し，どのように変換するかについては十分に検討されていない。ここでは，次のように各フレームの２次元情報を１次元情報へ変換し，比較実験をした。 Although several methods have been proposed for symbol conversion from an average velocity vector, what kind of feature value is selected and how it is converted has not been fully studied. Here, two-dimensional information of each frame was converted into one-dimensional information as follows, and a comparative experiment was performed.

すなわち，本実施の形態では，各フレームから得られる平均速度ベクトルの大きさと方向に関して，図４のような変換対応図を用いることにした。変換対応図では，予備実験の結果から中央付近の密度を高くすることが望ましいことがわかった。各フレームは１つのシンボルに対応させるが，分割数と形状については，例えば，図４（Ａ）に示すような矩形状の２５出力シンボルや，図４（Ｂ）に示すような円状の１７出力シンボルを用いるものを設計した。図４に示す２つの出力シンボルパターンの長所と短所については，認識実験により検証した。図４（Ａ）では平均速度ベクトルにシンボル１７が割り当てられ，図４（Ｂ）ではシンボル７が割り当てられている例を示している。 That is, in this embodiment, the conversion correspondence diagram as shown in FIG. 4 is used for the magnitude and direction of the average velocity vector obtained from each frame. In the conversion correspondence diagram, it was found from the results of preliminary experiments that it is desirable to increase the density near the center. Each frame corresponds to one symbol, but the number of divisions and the shape are, for example, a rectangular 25 output symbol as shown in FIG. 4A or a circular 17 as shown in FIG. 4B. Designed to use output symbols. The advantages and disadvantages of the two output symbol patterns shown in FIG. 4 were verified by recognition experiments. 4A shows an example in which the symbol 17 is assigned to the average velocity vector, and FIG. 4B shows an example in which the symbol 7 is assigned.

図５は，あるトンカチ操作の時系列画像からシンボル時系列を生成する例を示している。図５（Ａ）に示すようなトンカチによる釘打ちの時系列画像から，図５（Ｂ）に示すオプティカルフローを経て得られた速度ベクトルを，図５（Ｃ）に示す変換対応図のシンボル変換テーブル１４１を用いて変換する。図５（Ｃ）に示されているように，トンカチが釘に達するまで，速度ベクトルの方向と大きさが変化した。この結果，出力シンボル列は，図５（Ｄ）に示すように（７，７，７，８，１，４）となった。 FIG. 5 shows an example in which a symbol time series is generated from a time series image of a certain tonkachi operation. The velocity vector obtained through the optical flow shown in FIG. 5 (B) from the time-series image of nailing with a tonkachi as shown in FIG. 5 (A) is converted into the symbol conversion of the conversion correspondence diagram shown in FIG. 5 (C). Conversion is performed using the table 141. As shown in FIG. 5C, the direction and magnitude of the velocity vector changed until the torch reached the nail. As a result, the output symbol string is (7, 7, 7, 8, 1, 4) as shown in FIG.

図６は，４種類の道具に対するＨＭＭの学習の例を示す図である。例えば，ドライバー動作を学習し，図６（Ａ）に示されるドライバーのシンボル時系列（１４，１４，１２，７，６，６，６・・・）から，Ｂ−Ｗ（Baum-Welch）アルゴリズムにより，図６（Ｂ）に示すような推移確率行列Ａと出力確率行列ＢからなるＨＭＭモデルパラメータを算出する。このＨＭＭモデルパラメータを，道具ごとに動作ＤＢ１００に記録する。 FIG. 6 is a diagram illustrating an example of HMM learning for four types of tools. For example, the driver operation is learned, and the BW (Baum-Welch) algorithm is obtained from the driver symbol time series (14, 14, 12, 7, 6, 6, 6...) Shown in FIG. Thus, an HMM model parameter composed of a transition probability matrix A and an output probability matrix B as shown in FIG. This HMM model parameter is recorded in the operation DB 100 for each tool.

このようなＨＭＭの学習を４種類の道具それぞれについて行うことにより，図６（Ｃ）に示すように，動作ＤＢ１００にドライバー，トンカチ，キリ，ノコギリの４種類（カテゴリー）のＨＭＭが生成される。 By performing such HMM learning for each of the four types of tools, as shown in FIG. 6C, four types (categories) of HMMs such as a driver, a tonker, a drill, and a saw are generated in the operation DB 100.

図７は，シンボルの時系列パターンによる動作認識の流れを示す図である。例えば，図７（Ａ）に示すような未知のシンボル時系列（１４，２，４，１２，７，２，１２）が入力された場合，図７（Ｂ）に示すように，その対数尤度を動作ＤＢ１００に格納された４種類のＨＭＭで計算する。そして，算出された最も小さい対数尤度に対応する道具が，入力されたシンボル時系列パターンの動作シーンで操作されている道具であると認識する。この例では，対数尤度が 219.675661 から 819.264267 までの値が算出されたが，最も小さい対数尤度（219.675661）であったドライバーが，入力されたシンボル時系列パターンの動作シーンで操作されている道具であると認識されることになる。 FIG. 7 is a diagram showing a flow of motion recognition based on a time series pattern of symbols. For example, when an unknown symbol time series (14, 2, 4, 12, 7, 2, 12) as shown in FIG. 7A is input, as shown in FIG. The degree is calculated with four types of HMMs stored in the operation DB 100. Then, the tool corresponding to the calculated logarithmic likelihood is recognized as the tool operated in the operation scene of the input symbol time series pattern. In this example, the log likelihood is calculated from 219.675661 to 819.264267, but the driver with the smallest log likelihood (219.675661) is operated in the operation scene of the input symbol time series pattern. Will be recognized.

図８は，操作者一人が４つそれぞれの道具操作をしたときの，Ｂ−Ｗアルゴリズムによる学習過程を示す図である。縦軸は対数尤度，横軸は反復回数を表している。いずれの場合にも反復回数が５０回以上で，尤度が−３３０〜−４３０の範囲でほぼ収束した。キリの場合の対数尤度が最も小さい値に収束した。他の被験者でも同様の傾向であった。学習する時間は１人分，１つのカテゴリー当り，２０〜３０秒，認識する時間は，４つのカテゴリーで０．０２秒であった。 FIG. 8 is a diagram showing a learning process by the BW algorithm when one operator operates each of the four tools. The vertical axis represents log likelihood, and the horizontal axis represents the number of iterations. In either case, the number of iterations was 50 times or more, and the convergence was approximately in the range of -330 to -430. The log likelihood in the case of Kiri converged to the smallest value. The same tendency was observed in other subjects. The learning time was 20-30 seconds per person, per category, and the recognition time was 0.02 seconds in four categories.

図９は，３人（Ａ，Ｂ，Ｃ）で学習したＨＭＭを用いたときの，各人の動作の平均認識率を示す図である。３人で学習したＨＭＭを用いて，各人の動作の認識率について実験を行い，１人で学習したＨＭＭとの差異を調べた。図９では，出力シンボル数が１７と２５のそれぞれの場合についての実験結果の平均認識率を示している。図９中，括弧内は，１人で学習した場合での認識率からの改善率を示す。 FIG. 9 is a diagram showing an average recognition rate of each person's movement when using an HMM learned by three persons (A, B, C). Using HMMs learned by three people, experiments were performed on the recognition rate of each person's movement, and the differences from HMMs learned by one person were examined. FIG. 9 shows the average recognition rate of the experimental results when the number of output symbols is 17 and 25, respectively. In FIG. 9, the parentheses indicate the improvement rate from the recognition rate when learning by one person.

出力シンボル数が１７の場合には，ドライバーとトンカチで，１０％以上改善し，出力シンボル数が２５の場合には，ドライバー，トンカチ，キリで７％〜１２％の改善が見られた。４つの道具の平均認識率は，出力シンボル数が１７の方が約３％高くなった。 When the number of output symbols is 17, the driver and tonkerchie improved by 10% or more, and when the number of output symbols is 25, the driver, tonkerchid and drill improved by 7% to 12%. The average recognition rate of the four tools was about 3% higher when the number of output symbols was 17.

図１０は，入力シンボル数を変化させた場合の認識率の変化を示す図である。図１０に示すように，入力シンボル数が５０から５へと少なくなると，どの道具についても認識率が低下した。これは時系列的な特徴量が減少するので，当然のことである。 FIG. 10 is a diagram illustrating a change in recognition rate when the number of input symbols is changed. As shown in FIG. 10, when the number of input symbols decreased from 50 to 5, the recognition rate decreased for any tool. This is natural because time-series feature values decrease.

上述したように，本発明により，実環境下で人が各種大工道具を手で操作している動画シーンからどの道具を用いているかを，オプティカルフローとＨＭＭに基づいて高精度に認識できることが確認できた。 As described above, according to the present invention, it can be confirmed that which tool is used from a moving image scene in which a person manually operates various carpenter tools in an actual environment based on the optical flow and the HMM. did it.

単眼カメラからの動作認識の問題は，操作者の指，甲，腕によるオクルージョンのため，道具形状の当てはめが困難であること，動きの周期性は弱いこと，操作の早さの相違などである。そこで本発明では，手と道具の一体の動きを分離することなく操作をモデル化し，その効果を検証する実験では，それらの動きから４つの道具をカテゴリーとした動作認識を行った。非線形ロバスト関数を介したオプティカルフロー法により不連続な動き成分による推定誤差を抑制し，時間軸の伸縮性に強いＨＭＭを適用し学習と認識を実現している。シンボル時系列の生成は，オプティカルフローの平均速度ベクトルを，予め設計した変換対応図を用いて出力シンボル数にマッピングすることにより行った。 The problem of motion recognition from a monocular camera is that it is difficult to fit the tool shape due to the occlusion of the operator's fingers, back and arms, the periodicity of movement is weak, and the speed of operation is different. . Therefore, in the present invention, the operation is modeled without separating the integral movement of the hand and the tool, and in the experiment for verifying the effect, motion recognition is performed with the four tools as categories from these movements. The optical flow method via a nonlinear robust function suppresses estimation errors caused by discontinuous motion components, and realizes learning and recognition by applying an HMM that is strong in time-axis elasticity. The symbol time series was generated by mapping the average velocity vector of the optical flow to the number of output symbols using a conversion correspondence diagram designed in advance.

学習は１人から３人で行い，入力シンボル数や出力シンボル数の違いなどについて認識実験を行った。その結果，同一人物での学習と認識では，平均で最大１００％の認識率が得られ，また，学習と認識で異なる人物が操作した場合でも，最大８８．６％の認識率が得られた。認識が困難な入力シンボル数５（０．２秒分）の短いデータの場合でも，平均７９．４％以上の高い認識率が得られたことから，本発明のロバスト性と有効性が示されている。 Learning was performed by one to three people, and recognition experiments were conducted on differences in the number of input symbols and the number of output symbols. As a result, an average recognition rate of up to 100% was obtained for learning and recognition by the same person, and a recognition rate of up to 88.6% was obtained even when different persons operated for learning and recognition. . Even in the case of short data with 5 input symbols (0.2 seconds) that are difficult to recognize, a high recognition rate of an average of 79.4% or more was obtained, indicating the robustness and effectiveness of the present invention. ing.

ＨＭＭの学習では，同一人物の認識では高い認識率を得た。０．２秒という極めて短いシンボル時系列に，４つのカテゴリーを識別できるだけの特徴が含まれることもわかった。異なる人物での学習と認識についても，一定水準以上の認識率を得ると同時に，複数人での学習では，全体の認識率の向上を確認できた。 In HMM learning, a high recognition rate was obtained for recognition of the same person. It was also found that features that can distinguish four categories are included in a very short symbol time series of 0.2 seconds. Regarding learning and recognition with different persons, we obtained a recognition rate above a certain level, and at the same time, it was confirmed that the overall recognition rate was improved by learning with multiple people.

このことから，入力シンボル数が少ない場合には，学習データ数（人物）を増やせばよく，一方，入力シンボル数が多い場合には，学習データ数（人物）は少なくてもよいことがわかる。実用性の観点から言えば，学習データ数を増やすと学習時間を要するが，入力シンボル数が少ないと，その分高速に認識計算（最尤度）が可能である。 From this, it is understood that when the number of input symbols is small, the number of learning data (persons) may be increased, while when the number of input symbols is large, the number of learning data (persons) may be small. From a practical point of view, increasing the number of learning data requires learning time. However, if the number of input symbols is small, recognition calculation (maximum likelihood) is possible by that much.

本発明の処理全体を示すシステム構成図である。It is a system configuration figure showing the whole processing of the present invention. 平均速度ベクトルの導出を説明する図である。It is a figure explaining derivation of an average velocity vector. ２状態２５出力のＨＭＭの例を示す図である。It is a figure which shows the example of HMM of 2 states 25 output. 平均速度ベクトルをシンボルに変換する変換対応図の例を示す図である。It is a figure which shows the example of the conversion corresponding | compatible figure which converts an average speed vector into a symbol. トンカチ操作からのシンボル時系列生成の例を示す図である。It is a figure which shows the example of the symbol time series production | generation from a tonkachi operation. ４種類の道具に対するＨＭＭの学習を示す図である。It is a figure which shows learning of HMM with respect to four types of tools. シンボルの時系列パターンからの動作認識の流れまでを示す図である。It is a figure which shows to the flow of the operation | movement recognition from the time-sequential pattern of a symbol. 一操作者の学習過程の様子を示す図である。It is a figure which shows the mode of the learning process of one operator. ３人で学習した場合での，１人で学習した場合からの認識率の向上を示す図である。It is a figure which shows the improvement of the recognition rate when learning by one person when learning by three persons. 入力シンボル数を変化させた場合の認識率の変化を示す図である。It is a figure which shows the change of the recognition rate at the time of changing the number of input symbols. ４種類の道具を操作している動作シーンの例を示す図である。It is a figure which shows the example of the operation | movement scene which is operating four types of tools.

Explanation of symbols

１道具動作認識装置
１１画像入力部
１２画像蓄積部
１３速度推定部
１４シンボル時系列生成部
１５学習部
１６認識部
１７出力部
１００動作ＤＢ
１４１シンボル変換テーブル DESCRIPTION OF SYMBOLS 1 Tool motion recognition apparatus 11 Image input part 12 Image storage part 13 Speed estimation part 14 Symbol time series generation part 15 Learning part 16 Recognition part 17 Output part 100 Motion DB
141 Symbol conversion table

Claims

A device that recognizes movements when people are using various tools from the movement scenes of time-series images.
Image input means for inputting time-series images;
Image storage means for storing input time-series images;
Speed estimation means for estimating the speed vector of the motion scene in the time series image by optical flow;
Learning means for calculating and learning HMM model parameters for each tool operation of a plurality of types of tools to be recognized based on the estimated velocity vector;
With respect to an unknown motion scene of a time-series image newly input to the image input means, a speed vector is estimated by the speed estimation means, and which tool operation is performed based on an HMM of a learning result by the learning means A recognition means for identifying
A tool motion recognition device comprising: output means for outputting a tool operation identification result.

The tool movement recognition device according to claim 1,
The speed estimation means includes
A tool motion recognition device characterized in that an objective function is set in an optical flow framework, and the velocity vector is estimated by estimating each unknown by a least square method with the velocity component and the luminance fluctuation component as unknowns.

The tool movement recognition device according to claim 2,
The speed estimation means includes
A tool motion recognition apparatus, wherein a nonlinear robust function based on a Lorentz function or a bi-weight function is set as the objective function, and the unknown is estimated by minimizing the set nonlinear robust function.

The tool movement recognition device according to claim 3,
The speed estimation means includes
A tool motion recognition device using a steepest descent method, a neural network or a Levenberg-McCart method for minimizing the nonlinear robust function.

The tool movement recognition device according to claim 3,
The speed estimation means includes
A tool motion recognition apparatus, wherein when the nonlinear robust function is minimized, a variance value included in the Lorentz function or by weight function is changed stepwise from a large value to a small value.

In the tool movement recognition device according to any one of claims 1 to 5,
The learning means includes
Converting the estimated speed vector information into a symbol using a conversion correspondence diagram for converting predetermined speed vector information into a symbol, and calculating the HMM model parameter using the converted symbol. Tool motion recognition device.

The tool movement recognition device according to claim 6,
The learning means includes
The tool operation characterized in that the estimated velocity vector information is converted into a symbol using the conversion correspondence diagram in which the arrangement of symbols is a concentric pattern according to the magnitude and direction of the velocity vector as the conversion correspondence diagram. Recognition device.

A method for recognizing movements when people are using various tools from movement scenes of time-series images.
An image input step for inputting a time-series image;
An image accumulation step for accumulating input time-series images;
A speed estimation step of estimating a speed vector of an operation scene in the time-series image by an optical flow;
A learning step of calculating and learning HMM model parameters for each tool operation of a plurality of types of tools to be recognized based on the estimated velocity vector;
A recognition step for estimating a speed vector for the unknown motion scene of the newly input time-series image by the speed estimation step and identifying which tool operation is being performed based on the HMM of the learning result by the learning step When,
An output step for outputting a result of the type of tool operation.