JP2021012628A

JP2021012628A - Position and posture estimation device and position and posture estimation method

Info

Publication number: JP2021012628A
Application number: JP2019127595A
Authority: JP
Inventors: 智樹細居; Tomoki Hosoi
Original assignee: Azbil Corp
Current assignee: Azbil Corp
Priority date: 2019-07-09
Filing date: 2019-07-09
Publication date: 2021-02-04

Abstract

【課題】従来に対し、ワークの一部が他の物体に遮られていても、ワークの位置姿勢をより精度よく推定可能とする。【解決手段】可視画像、位置画像及び法線画像を取得する画像取得部１１と、画像取得部１１により取得された可視画像に基づいて、ワークの位置姿勢を推定する概略推定部１２と、画像取得部１１による取得結果及び概略推定部１２により推定された位置姿勢に基づいて、画像上の有効な画素を判定した上で、当該位置姿勢で画像に投影したワークの３次元モデルの法線ベクトルと法線画像が示す法線ベクトルとを比較して、当該位置姿勢のうちの回転量を修正する回転修正部１３と、画像取得部１１による取得結果及び回転修正部１３による修正後の位置姿勢に基づいて、画像上の有効な画素を判定した上で、当該位置姿勢で画像に投影したワークの３次元モデルの３次元位置と位置画像が示す３次元位置とを比較して、当該位置姿勢のうちの並進量を修正する並進修正部１４とを備えた。【選択図】図１PROBLEM TO BE SOLVED: To more accurately estimate the position and orientation of a work even if a part of the work is obstructed by another object. An image acquisition unit 11 that acquires a visible image, a position image, and a normal image, a schematic estimation unit 12 that estimates the position and orientation of a work based on a visible image acquired by the image acquisition unit 11, and an image. After determining the effective pixels on the image based on the acquisition result by the acquisition unit 11 and the position and orientation estimated by the rough estimation unit 12, the normal vector of the three-dimensional model of the work projected on the image in the position and orientation. Is compared with the normal vector shown by the normal image, and the rotation correction unit 13 that corrects the amount of rotation in the position and orientation, the acquisition result by the image acquisition unit 11, and the position and orientation after correction by the rotation correction unit 13 After determining the effective pixels on the image based on, the position and orientation are compared with the three-dimensional position of the three-dimensional model of the work projected on the image in the position and orientation and the three-dimensional position indicated by the position image. A translation correction unit 14 for correcting the translation amount of the above is provided. [Selection diagram] Fig. 1

Description

この発明は、画像からワークの位置姿勢を推定する位置姿勢推定装置及び位置姿勢推定方法に関する。 The present invention relates to a position / orientation estimation device and a position / orientation estimation method for estimating the position / orientation of a work from an image.

近年、工場等の生産現場では、生産効率を向上させるため、ロボットによる自動化が進められている。その一環として、バラ積みピッキング技術が開発されてきている。バラ積みピッキング技術は、トレイの中にバラ積みされたワークの位置姿勢を推定し、その推定結果に基づいてロボットハンドによるワークの把持を行う技術である。 In recent years, at production sites such as factories, automation by robots has been promoted in order to improve production efficiency. As part of this, bulk picking technology has been developed. The bulk picking technique is a technique of estimating the position and orientation of the workpieces stacked in the tray and grasping the workpieces by the robot hand based on the estimation result.

ワークの位置姿勢を推定する方法として、ワークがバラ積みされたトレイが映された画像からワークの大まかな位置姿勢を推定し、その位置姿勢を初期値として画像データに対してワークの３次元モデルを当てはめ、位置姿勢を更新していく方法がある（例えば特許文献１参照）。 As a method of estimating the position and orientation of the work, the rough position and orientation of the work is estimated from the image on which the trays in which the works are stacked are projected, and the position and orientation are used as the initial values for the three-dimensional model of the work with respect to the image data. There is a method of updating the position and orientation by applying (see, for example, Patent Document 1).

特開２０１０−６９５４２号公報Japanese Unexamined Patent Publication No. 2010-69542

J. Gall, and V. Lempitsky, “Class-specific hough forests forobject detection”, Computer Vision and Pattern Recognition, 2009.J. Gall, and V. Lempitsky, “Class-specific hough forests for object detection”, Computer Vision and Pattern Recognition, 2009.

しかしながら、バラ積みの状態では、ワークの一部の領域が別の物体（ワーク）に遮られている場合が多い。この場合、上記一部の領域における距離画像の値は、ワークまでの本来の距離ではない異常値となる。異常値が含まれる距離画像を用いて従来手法による推定を行った場合、真値の位置姿勢でも照合スコアが低くなったり、誤った位置姿勢でも照合スコアが高くなったりするため、精度のよい位置姿勢を探索することが困難となる。 However, in the loosely stacked state, a part of the work area is often blocked by another object (work). In this case, the value of the distance image in the above-mentioned part of the region is an abnormal value that is not the original distance to the work. When the estimation is performed by the conventional method using a distance image containing an abnormal value, the collation score is low even in the true position and orientation, and the collation score is high even in the wrong position and orientation, so the position is accurate. It becomes difficult to search for a posture.

この発明は、上記のような課題を解決するためになされたもので、従来に対し、ワークの一部が他の物体に遮られていても、ワークの位置姿勢をより精度よく推定可能な位置姿勢推定装置を提供することを目的としている。 The present invention has been made to solve the above-mentioned problems, and compared to the conventional case, even if a part of the work is obstructed by another object, the position and orientation of the work can be estimated more accurately. It is an object of the present invention to provide a posture estimation device.

この発明に係る位置姿勢推定装置は、可視画像、当該可視画像に映された物体の３次元位置を画素毎に示す位置画像、及び、当該可視画像に映された物体の法線ベクトルを画素毎に示す法線画像を取得する画像取得部と、画像取得部により取得された可視画像に基づいて、ワークの位置姿勢を推定する概略推定部と、画像取得部による取得結果及び概略推定部により推定された位置姿勢に基づいて、画像上の有効な画素を判定した上で、当該位置姿勢で画像に投影したワークの３次元モデルの法線ベクトルと法線画像が示す法線ベクトルとを比較して、当該位置姿勢のうちの回転量を修正する回転修正部と、画像取得部による取得結果及び回転修正部による修正後の位置姿勢に基づいて、画像上の有効な画素を判定した上で、当該位置姿勢で画像に投影したワークの３次元モデルの３次元位置と位置画像が示す３次元位置とを比較して、当該位置姿勢のうちの並進量を修正する並進修正部とを備えたことを特徴とする。 The position / orientation estimation device according to the present invention has a visible image, a position image showing the three-dimensional position of the object projected on the visible image for each pixel, and a normal vector of the object projected on the visible image for each pixel. The image acquisition unit that acquires the normal image shown in (1), the approximate estimation unit that estimates the position and orientation of the work based on the visible image acquired by the image acquisition unit, and the acquisition result by the image acquisition unit and the approximate estimation unit. After determining the valid pixels on the image based on the determined position and orientation, the normal vector of the three-dimensional model of the work projected on the image in the position and orientation is compared with the normal vector shown by the normal image. Then, after determining the effective pixels on the image based on the rotation correction unit that corrects the rotation amount of the position and orientation, the acquisition result by the image acquisition unit, and the corrected position and orientation by the rotation correction unit. It is provided with a translation correction unit that compares the three-dimensional position of the three-dimensional model of the work projected on the image in the position and orientation with the three-dimensional position indicated by the position image and corrects the translation amount in the position and orientation. It is characterized by.

この発明によれば、上記のように構成したので、従来に対し、ワークの一部が他の物体に遮られていても、ワークの位置姿勢をより精度よく推定可能となる。 According to the present invention, since the structure is as described above, the position and orientation of the work can be estimated more accurately than in the past even if a part of the work is blocked by another object.

実施の形態１に係る位置姿勢推定装置の構成例を示す図である。It is a figure which shows the structural example of the position | posture estimation apparatus which concerns on Embodiment 1. FIG. 実施の形態１における回転修正部の構成例を示す図である。It is a figure which shows the structural example of the rotation correction part in Embodiment 1. 実施の形態１における並進修正部の構成例を示す図である。It is a figure which shows the structural example of the translation correction part in Embodiment 1. 実施の形態１に係る位置姿勢推定装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the position | posture estimation apparatus which concerns on Embodiment 1. FIG. 図５Ａ〜図５Ｃは、実施の形態１における画像取得部により取得される画像群の一例を示す図であり、図５Ａは可視画像の一例を示す図であり、図５Ｂは位置画像の一例を示す図であり、図５Ｃは法線画像の一例を示す図である。5A to 5C are diagrams showing an example of an image group acquired by the image acquisition unit in the first embodiment, FIG. 5A is a diagram showing an example of a visible image, and FIG. 5B is an example of a position image. FIG. 5C is a diagram showing an example of a normal image. 図６Ａ、図６Ｂは、実施の形態１における概略推定部の動作例を示す図である。6A and 6B are diagrams showing an operation example of the schematic estimation unit according to the first embodiment. 実施の形態１における回転修正部の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the rotation correction part in Embodiment 1. 図８Ａ〜図８Ｃは、実施の形態１における投影部により生成される投影画像群の一例を示す図であり、図８Ａは投影ラベル画像の一例を示す図であり、図８Ｂは投影位置画像の一例を示す図であり、図８Ｃは投影法線画像の一例を示す図である。8A to 8C are diagrams showing an example of a projection image group generated by the projection unit in the first embodiment, FIG. 8A is a diagram showing an example of a projection label image, and FIG. 8B is a projection position image. It is a figure which shows an example, and FIG. 8C is a figure which shows an example of a projection normal line image. 実施の形態１における並進修正部の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the translation correction part in Embodiment 1.

以下、この発明の実施の形態について図面を参照しながら詳細に説明する。
実施の形態１．
図１は実施の形態１に係る位置姿勢推定装置１の構成例を示す図である。
位置姿勢推定装置１は、ワークの位置姿勢を推定する。位置姿勢推定装置１は、図１に示すように、画像取得部１１、概略推定部１２、回転修正部１３及び並進修正部１４を備えている。なお、位置姿勢推定装置１は、システムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等の処理回路、又はメモリ等に記憶されたプログラムを実行するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等により実現される。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
Embodiment 1.
FIG. 1 is a diagram showing a configuration example of the position / posture estimation device 1 according to the first embodiment.
The position / orientation estimation device 1 estimates the position / orientation of the work. As shown in FIG. 1, the position / orientation estimation device 1 includes an image acquisition unit 11, a schematic estimation unit 12, a rotation correction unit 13, and a translation correction unit 14. The position / orientation estimation device 1 is realized by a processing circuit such as a system LSI (Large Scale Integration), a CPU (Central Processing Unit) that executes a program stored in a memory or the like, or the like.

画像取得部１１は、画像群を取得する。画像群は、ワークの位置姿勢を推定するために必要な画像群である。画像群には、可視画像、位置画像及び法線画像が含まれる。位置画像は、可視画像に映された物体の３次元位置を画素毎に示す画像である。法線画像は、可視画像に映された物体の法線ベクトルを画素毎に示す画像である。 The image acquisition unit 11 acquires an image group. The image group is a group of images necessary for estimating the position and orientation of the work. The image group includes a visible image, a position image, and a normal image. The position image is an image showing the three-dimensional position of the object projected on the visible image for each pixel. The normal image is an image showing the normal vector of the object projected on the visible image for each pixel.

概略推定部１２は、画像取得部１１により取得された可視画像に基づいて、ワークの位置姿勢を推定する。概略推定部１２による推定は、従来から知られている方法を用いた大まかな位置姿勢の推定である。 The rough estimation unit 12 estimates the position and orientation of the work based on the visible image acquired by the image acquisition unit 11. The estimation by the rough estimation unit 12 is a rough estimation of the position and orientation using a conventionally known method.

回転修正部１３は、画像取得部１１により取得された画像群及び概略推定部１２により推定された位置姿勢に基づいて、画像上の有効な画素を判定した上で、当該位置姿勢で画像に投影したワークの３次元モデルの法線ベクトルと法線画像が示す法線ベクトルとを比較して、当該位置姿勢のうちの回転量を修正する。回転修正部１３の詳細については後述する。 The rotation correction unit 13 determines valid pixels on the image based on the image group acquired by the image acquisition unit 11 and the position / orientation estimated by the rough estimation unit 12, and then projects the image in the position / orientation. The normal vector of the three-dimensional model of the work is compared with the normal vector shown by the normal image, and the amount of rotation in the position and orientation is corrected. The details of the rotation correction unit 13 will be described later.

並進修正部１４は、画像取得部１１により取得された画像群及び回転修正部１３による修正後の位置姿勢に基づいて、画像上の有効な画素を判定した上で、当該位置姿勢で画像に投影したワークの３次元モデルの３次元位置と位置画像が示す３次元位置とを比較して、当該位置姿勢のうちの並進量を修正する。並進修正部１４の詳細については後述する。 The translation correction unit 14 determines valid pixels on the image based on the image group acquired by the image acquisition unit 11 and the position and orientation after correction by the rotation correction unit 13, and then projects the image in the position and orientation. The translation amount of the position and orientation is corrected by comparing the three-dimensional position of the three-dimensional model of the work and the three-dimensional position indicated by the position image. The details of the translation correction unit 14 will be described later.

次に、回転修正部１３の構成例について、図２を参照しながら説明する。
回転修正部１３は、図２に示すように、投影部（第１投影部）１３１、画素判定部（第１画素判定部）１３２及び回転更新部１３３を有している。 Next, a configuration example of the rotation correction unit 13 will be described with reference to FIG.
As shown in FIG. 2, the rotation correction unit 13 has a projection unit (first projection unit) 131, a pixel determination unit (first pixel determination unit) 132, and a rotation update unit 133.

投影部１３１は、概略推定部１２により推定された位置姿勢でワークの３次元モデルを画像に投影することで、投影画像群を生成する。投影画像群には、投影ラベル画像、投影位置画像及び投影法線画像が含まれる。投影ラベル画像は、概略推定部１２により推定された位置姿勢で画像に投影されたワークの３次元モデルの各平面の識別情報を画素毎に示す画像である。投影位置画像は、概略推定部１２により推定された位置姿勢で画像に投影されたワークの３次元モデルの３次元位置を画素毎に示す画像である。投影法線画像は、概略推定部１２により推定された位置姿勢で画像に投影されたワークの３次元モデルの法線ベクトルを画素毎に示す画像である。 The projection unit 131 generates a projected image group by projecting a three-dimensional model of the work onto an image in the position and orientation estimated by the rough estimation unit 12. The projected image group includes a projected label image, a projected position image, and a projected normal image. The projected label image is an image showing identification information of each plane of the three-dimensional model of the work projected on the image in the position and orientation estimated by the approximate estimation unit 12 for each pixel. The projected position image is an image showing the three-dimensional position of the three-dimensional model of the work projected on the image in the position and orientation estimated by the approximate estimation unit 12 for each pixel. The projection normal image is an image showing the normal vector of the three-dimensional model of the work projected on the image in the position and orientation estimated by the rough estimation unit 12 for each pixel.

画素判定部１３２は、画像取得部１１により取得された画像群及び投影部１３１により生成された投影画像群に基づいて、投影ラベル画像のうちの有効な平面に属する画素を有効な画素として判定する。 The pixel determination unit 132 determines as a valid pixel a pixel belonging to an effective plane of the projected label image based on the image group acquired by the image acquisition unit 11 and the projected image group generated by the projection unit 131. ..

回転更新部１３３は、画素判定部１３２による判定結果に基づいて、投影部１３１により生成された投影法線画像が示す法線ベクトルと画像取得部１１により取得された法線画像が示す法線ベクトルとの差から特異値分解に基づいて回転行列を生成することで、概略推定部１２により推定された位置姿勢のうちの回転量を更新する。 The rotation update unit 133 includes a normal vector indicated by the projection normal image generated by the projection unit 131 and a normal vector indicated by the normal image acquired by the image acquisition unit 11 based on the determination result by the pixel determination unit 132. By generating a rotation matrix based on the singular value decomposition from the difference between the above and the above, the rotation amount of the position and orientation estimated by the approximate estimation unit 12 is updated.

次に、並進修正部１４の構成例について、図３を参照しながら説明する。
並進修正部１４は、図３に示すように、投影部（第２投影部）１４１、画素判定部（第２画素判定部）１４２及び並進更新部１４３を有している。 Next, a configuration example of the translation correction unit 14 will be described with reference to FIG.
As shown in FIG. 3, the translation correction unit 14 has a projection unit (second projection unit) 141, a pixel determination unit (second pixel determination unit) 142, and a translation update unit 143.

投影部１４１は、回転修正部１３による修正後の位置姿勢でワークの３次元モデルを画像に投影することで、投影画像群を生成する。投影画像群には、投影ラベル画像、投影位置画像及び投影法線画像が含まれる。投影ラベル画像は、回転修正部１３による修正後の位置姿勢で画像に投影されたワークの３次元モデルの各平面の識別情報を画素毎に示す画像である。投影位置画像は、回転修正部１３による修正後の位置姿勢で画像に投影されたワークの３次元モデルの３次元位置を画素毎に示す画像である。投影法線画像は、回転修正部１３による修正後の位置姿勢で画像に投影されたワークの３次元モデルの法線ベクトルを画素毎に示す画像である。 The projection unit 141 generates a projected image group by projecting a three-dimensional model of the work onto an image in the position and orientation corrected by the rotation correction unit 13. The projected image group includes a projected label image, a projected position image, and a projected normal image. The projected label image is an image showing the identification information of each plane of the three-dimensional model of the work projected on the image in the position and orientation corrected by the rotation correction unit 13 for each pixel. The projected position image is an image showing the three-dimensional position of the three-dimensional model of the work projected on the image in the position and orientation corrected by the rotation correction unit 13 for each pixel. The projection normal image is an image showing the normal vector of the three-dimensional model of the work projected on the image in the position and orientation corrected by the rotation correction unit 13 for each pixel.

画素判定部１４２は、画像取得部１１により取得された撮像画像群及び投影部１４１により生成された画像群に基づいて、投影ラベル画像のうちの有効な平面に属する画素を有効な画素として判定する。 The pixel determination unit 142 determines the pixels belonging to the effective plane of the projected label image as effective pixels based on the captured image group acquired by the image acquisition unit 11 and the image group generated by the projection unit 141. ..

並進更新部１４３は、画素判定部１４２による判定結果に基づいて、画像取得部１１により取得された位置画像から物体の各平面の方程式を算出し、投影部１４１により生成された投影位置画像における各画素と対応する平面との位置関係から、最小二乗法により回転修正部１３による修正後の位置姿勢のうちの並進量を更新する。 The translation update unit 143 calculates the equation of each plane of the object from the position image acquired by the image acquisition unit 11 based on the determination result by the pixel determination unit 142, and each of the projection position images generated by the projection unit 141. Based on the positional relationship between the pixel and the corresponding plane, the translation amount of the corrected position and orientation by the rotation correction unit 13 is updated by the minimum square method.

次に、図１に示す実施の形態１に係る位置姿勢推定装置１の動作例について、図４を参照しながら説明する。
ワークの位置姿勢を推定する場合、仮に、ワークの表面が平面で構成され、それらの平面のうち、法線ベクトルが互いに独立な３つ以上の平面が画像上で検出可能であり、且つ、各平面の方程式が特定可能である場合、ワークの位置姿勢は一意に特定可能である。平面の方程式の特定に関しては、概略推定の結果より、当該平面の領域に含まれる画素から、最小二乗法等を用いれば取得可能である。 Next, an operation example of the position / orientation estimation device 1 according to the first embodiment shown in FIG. 1 will be described with reference to FIG.
When estimating the position and orientation of the work, it is assumed that the surface of the work is composed of planes, and among those planes, three or more planes whose normal vectors are independent of each other can be detected on the image, and each of them. When the equation of a plane can be specified, the position and orientation of the work can be uniquely specified. The equation of a plane can be specified from the result of the rough estimation by using the least squares method or the like from the pixels included in the region of the plane.

従来では、なるべく多くのワーク表面の画像データを取込んで位置姿勢の推定を行っており、その中では、画像データと座標変換した３次元モデルとの比較を画素毎に行っている。一方、比較結果に乖離がある画素については、その乖離が遮蔽等の異常によるものであるのか否かの判断が困難であるため、その推定結果は遮蔽による影響を大きく受ける。 Conventionally, as much image data as possible on the surface of the work is taken in and the position and orientation are estimated, and in that, the image data and the coordinate-converted three-dimensional model are compared for each pixel. On the other hand, for pixels with a discrepancy in the comparison result, it is difficult to determine whether or not the discrepancy is due to an abnormality such as occlusion, so the estimation result is greatly affected by the occlusion.

実施の形態１に係る位置姿勢推定装置１でも、従来と同様に、画像データと座標変換した３次元モデルとの比較を行う。一方、実施の形態１に係る位置姿勢推定装置１では、従来とは異なり、比較の目的が平面の方程式を特定することにあるため、比較結果に乖離がある画素は無視し、乖離の少ない画素のみを利用して平面の方程式を特定する。その結果として、実施の形態１に係る位置姿勢推定装置１では、遮蔽による影響の少ない安定した推定結果を得られる。 The position / orientation estimation device 1 according to the first embodiment also compares the image data with the coordinate-converted three-dimensional model as in the conventional case. On the other hand, in the position / orientation estimation device 1 according to the first embodiment, unlike the conventional case, the purpose of the comparison is to specify the equation of a plane. Therefore, the pixels having a deviation in the comparison result are ignored, and the pixels having a small deviation are ignored. Use only to identify the equation of a plane. As a result, the position / orientation estimation device 1 according to the first embodiment can obtain a stable estimation result with less influence due to shielding.

但し、実施の形態１に係る位置姿勢推定装置１が対象とするワークは、形状が幾つかの条件を満たす必要がある。それは、ワークの表面が平面で構成され、どのような姿勢においても法線ベクトルが互いに独立な３つ以上の平面が画像上で検出可能なことである。
実施の形態１に係る位置姿勢推定装置１は、上記のような制約が課されるものの、ワークの表面を構成する各平面の方程式を特定するだけでよいため、従来に対し、遮蔽等に対して頑健な位置姿勢推定結果が得られる。 However, the shape of the work targeted by the position / orientation estimation device 1 according to the first embodiment needs to satisfy some conditions. That is, the surface of the work is composed of planes, and three or more planes whose normal vectors are independent of each other can be detected on the image in any posture.
Although the position / orientation estimation device 1 according to the first embodiment is subject to the above-mentioned restrictions, it is only necessary to specify the equations of the planes constituting the surface of the work. A robust position and orientation estimation result can be obtained.

このように、実施の形態１に係る位置姿勢推定装置１は、可視画像からワークの位置姿勢を推定し、画像上の有効な画素を判定した上で法線画像に基づいて位置姿勢の回転量を修正し、画像上の有効な画素を判定した上で位置画像に基づいて位置姿勢の並進量を修正することで、ロボットハンド等を用いてワークを把持するのに必要なワークの位置姿勢を推定する。 As described above, the position / orientation estimation device 1 according to the first embodiment estimates the position / orientation of the work from the visible image, determines the effective pixels on the image, and then rotates the position / orientation based on the normal image. By correcting the translation amount of the position and orientation based on the position image after determining the effective pixels on the image, the position and orientation of the work required to grip the work using a robot hand or the like can be obtained. presume.

ここで、位置姿勢は、３次元空間における回転量［ｒａｄｉａｎ］及び並進量［ｍｍ］の６つの数値［θ_ｘ，θ_ｙ，θ_ｚ，ｔ_ｘ，ｔ_ｙ，ｔ_ｚ］で構成される。この位置姿勢の値により、下式（１）のように、３次元空間における座標変換行列が生成可能である。式（１）において、Ｍは座標変換行列を示している。

Here, the position and orientation are composed of six numerical values [θ _x , θ _y , θ _z , t _x , _ty , t _z ] of the amount of rotation [radian] and the amount of translation [mm] in the three-dimensional space. From the value of this position and orientation, a coordinate transformation matrix in the three-dimensional space can be generated as shown in the following equation (1). In equation (1), M represents a coordinate transformation matrix.

逆に、下式（２）のように座標変換行列の要素から位置姿勢の各要素を取得可能である。

On the contrary, as shown in the following equation (2), each element of the position and orientation can be obtained from the elements of the coordinate transformation matrix.

また、下式（３）のように、３次元空間上の任意の点は、座標変換行列の要素を用いることにより、３次元空間上の点に座標変換される。式（３）において、（ｘ，ｙ，ｚ）は座標変換前の３次元空間上の任意の点を示し、（ｘ’，ｙ’，ｚ’）は（ｍ_１１〜ｍ_３４）による座標変換後の３次元空間上の点を示している。

Further, as in the following equation (3), any point on the three-dimensional space is coordinate-converted to a point on the three-dimensional space by using the elements of the coordinate transformation matrix. In equation (3), (x, y, z) indicates an arbitrary point in the three-dimensional space before coordinate transformation, and (x', y', z') is coordinate transformation by (m _{11 to} m ₃₄ ). It shows a point on the later three-dimensional space.

また、下式（４）のように、３次元空間上の任意の点は、カメラ固有の較正係数により、画像上に投影する際の画像座標に変換される。式（４）において、（ｃ_１１〜ｃ_３４）は較正係数を示し、（Ｘ，Ｙ）は（ｃ_１１〜ｃ_３４）による変換後の画像座標を示している。

Further, as in the following equation (4), any point in the three-dimensional space is converted into image coordinates when projected onto the image by the calibration coefficient peculiar to the camera. In equation (4), (c _{11 to} c ₃₄ ) indicate the calibration coefficient, and (X, Y) indicates the image coordinates after conversion by (c _{11 to} c ₃₄ ).

図１に示す実施の形態１に係る位置姿勢推定装置１の動作例では、図４に示すように、まず、画像取得部１１は、画像群（可視画像、位置画像及び法線画像）を取得する（ステップＳＴ４０１）。画像群について、図５を参照しながら説明する。 In the operation example of the position / orientation estimation device 1 according to the first embodiment shown in FIG. 1, as shown in FIG. 4, the image acquisition unit 11 first acquires an image group (visible image, position image, and normal image). (Step ST401). The image group will be described with reference to FIG.

図５Ａに示すように、可視画像は、物体が映された画像である。可視画像の画素データは、１つの整数で構成される。可視画像は、通常のカメラによる撮像により取得可能である。 As shown in FIG. 5A, the visible image is an image on which an object is projected. The pixel data of the visible image is composed of one integer. The visible image can be acquired by imaging with a normal camera.

図５Ｂに示すように、位置画像は、可視画像に映された物体の３次元位置を画素毎に示す画像である。位置画像は、各画素が３次元空間における座標を示し、画素データが３つの浮動小数点で構成される。図５Ｂに示す位置画像では、上から順に、３次元位置におけるＸ座標の画像、３次元位置におけるＹ座標の画像、３次元位置におけるＺ座標の画像を示している。位置画像の取得方法としては、空間コード法、ステレオマッチング法、又はＴＯＦ（Ｔｉｍｅ−Ｏｆ−Ｆｌｉｇｈｔ）法等が挙げられる。空間コード化法は、測定対象にピッチの異なる複数の縞が投影された複数の画像から距離を算出する方法である。ステレオマッチング法は、複数のカメラで撮像された画像から三角測量の原理により距離を算出する方法である。ＴＯＦ法は、光の飛行時間を利用して距離を算出する方法である。 As shown in FIG. 5B, the position image is an image showing the three-dimensional position of the object projected on the visible image for each pixel. In the position image, each pixel shows the coordinates in the three-dimensional space, and the pixel data is composed of three floating point numbers. In the position image shown in FIG. 5B, an image of the X coordinate at the three-dimensional position, an image of the Y coordinate at the three-dimensional position, and an image of the Z coordinate at the three-dimensional position are shown in order from the top. Examples of the position image acquisition method include a spatial code method, a stereo matching method, a TOF (Time-Of-Flight) method, and the like. The spatial coding method is a method of calculating a distance from a plurality of images in which a plurality of stripes having different pitches are projected on a measurement target. The stereo matching method is a method of calculating a distance from images captured by a plurality of cameras by the principle of triangulation. The TOF method is a method of calculating a distance using the flight time of light.

図５Ｃに示すように、法線画像は、可視画像に映された物体の法線ベクトルを画素毎に示す画像である。法線画像は、画像データが３つの浮動小数点で構成される。図５Ｃに示す法線画像では、上から順に、法線ベクトルにおけるＸ軸成分の画像、法線ベクトルにおけるＹ軸成分の画像、法線ベクトルにおけるＺ軸成分の画像を示している。法線画像の取得方法としては、位置画像における周囲の画素データを用いた最小二乗法又は固有空間法等を用いた手法が挙げられる。 As shown in FIG. 5C, the normal image is an image showing the normal vector of the object projected on the visible image for each pixel. In the normal image, the image data is composed of three floating point numbers. In the normal image shown in FIG. 5C, an image of the X-axis component in the normal vector, an image of the Y-axis component in the normal vector, and an image of the Z-axis component in the normal vector are shown in order from the top. Examples of the method for acquiring the normal image include a method using the least squares method using the surrounding pixel data in the position image, the eigenspace method, and the like.

次いで、概略推定部１２は、画像取得部１１により取得された可視画像に基づいて、ワークの位置姿勢を推定する（ステップＳＴ４０２）。概略推定部１２が利用可能な推定方法としては、テンプレートマッチングを用いた方法又はブースティングを用いた方法等のように従来から知られた各種方法が挙げられる。以下では、概略推定部１２は、非特許文献１に開示されているＨｏｕｇｈＦｏｒｅｓｔを用いた方法を示す。この方法では、事前準備で学習させた概略推定モデルを用いる。 Next, the rough estimation unit 12 estimates the position and orientation of the work based on the visible image acquired by the image acquisition unit 11 (step ST402). Examples of the estimation method that can be used by the rough estimation unit 12 include various conventionally known methods such as a method using template matching and a method using boosting. In the following, the schematic estimation unit 12 shows a method using the Hough Forest disclosed in Non-Patent Document 1. In this method, a rough estimation model trained in advance is used.

図６Ａに示すように、概略推定モデル（符号６０２が示すモデル）は、符号６０１に示すワークの断片を示す画像（断片画像）が入力されると、その断片画像が「ワークがどのような姿勢にあるときの、どの部分を写した画像であるのか」を推定し、ワークの姿勢及び位置を出力する。なお、上記ワークの姿勢は、断片画像に最も近い学習画像（符号６０３が示す画像）におけるワークの姿勢である。上記ワークの位置は、断片画像の中心からワーク中心までの画素単位の移動量（符号６０４が示す移動量）である。 As shown in FIG. 6A, in the schematic estimation model (model indicated by reference numeral 602), when an image (fragment image) showing a fragment of the work shown by reference numeral 601 is input, the fragment image is expressed as "what posture the work is in." Which part of the image is taken when it is in? ”Is estimated, and the posture and position of the work are output. The posture of the work is the posture of the work in the learning image (the image indicated by reference numeral 603) closest to the fragment image. The position of the work is a movement amount in pixel units (movement amount indicated by reference numeral 604) from the center of the fragment image to the center of the work.

概略推定部１２による実際の推定では、まず、学習した姿勢毎に投票画像を準備する。投票画像は可視画像と同一サイズであり、投票画像の各画素値は「ワークの中心位置である候補」としての得票数を表し、０で初期化しておく。 In the actual estimation by the rough estimation unit 12, first, a voting image is prepared for each learned posture. The voting image has the same size as the visible image, and each pixel value of the voting image represents the number of votes obtained as a "candidate that is the center position of the work" and is initialized with 0.

図６Ｂに示すように、概略推定部１２は、可視画像（符号６０５に示す画像）が入力されると、全ての画素について、その画素を中心とした断片画像を切出し、概略推定モデルへ入力する。概略推定モデルは、学習した姿勢の中で最も近い姿勢と、その画素からワーク中心までの移動量を出力し、概略推定部１２は、推定した姿勢の投票画像（符号６０６に示す画像）における推定したワークの中心位置に対応する画素に１を加算する。 As shown in FIG. 6B, when the visible image (image shown by reference numeral 605) is input, the schematic estimation unit 12 cuts out a fragment image centered on the pixel for all the pixels and inputs it to the schematic estimation model. .. The rough estimation model outputs the posture closest to the learned posture and the amount of movement from the pixel to the center of the work, and the rough estimation unit 12 estimates the estimated posture in the voting image (image shown by reference numeral 606). 1 is added to the pixels corresponding to the center position of the work.

概略推定部１２は、上記の処理が終了したら、全ての投票画像の中で、最も得票数の多い姿勢と位置を選択し、これらの情報から位置姿勢（Ｐ^０＝［θ_ｘ ^０，θ_ｙ ^０，θ_ｚ ^０，ｔ_ｘ ^０，ｔ_ｙ ^０，ｔ_ｚ ^０］）を得る。符号６０７に示す画像は、概略推定部１２による推定結果を示す画像である。この画像では、可視画像上において、概略推定部１２により推定された位置に円が描画されている。以下、概略推定部１２により推定された位置姿勢を概略位置姿勢と称す。 After the above processing is completed, the rough estimation unit 12 selects the posture and position having the largest number of votes among all the voting images, and from this information, the position posture (P ⁰ = [θ _x ⁰ , θ _y). ⁰ , θ _z ⁰ , t _x ⁰ , _ty ⁰ , t _z ⁰ ]) is obtained. The image shown by reference numeral 607 is an image showing the estimation result by the rough estimation unit 12. In this image, a circle is drawn at a position estimated by the rough estimation unit 12 on the visible image. Hereinafter, the position / posture estimated by the rough estimation unit 12 will be referred to as a rough position / posture.

次いで、回転修正部１３は、画像上の有効な画素を判定した上で、画像取得部１１により取得された法線画像に基づいて概略推定部１２により推定された位置姿勢（概略位置姿勢）のうちの回転量を修正する（ステップＳＴ４０３）。回転修正部１３は、概略位置姿勢で画像上に投影したワークの３次元モデルの法線ベクトルと法線画像が示す法線ベクトルとを比較することで、回転量の修正量を決定する。以下、回転修正部１３による修正後の位置姿勢を中間位置姿勢と称す。 Next, the rotation correction unit 13 determines the effective pixels on the image, and then determines the position / orientation (approximately position / orientation) estimated by the approximate estimation unit 12 based on the normal image acquired by the image acquisition unit 11. The amount of rotation is corrected (step ST403). The rotation correction unit 13 determines the correction amount of the rotation amount by comparing the normal vector of the three-dimensional model of the work projected on the image in the approximate position and orientation with the normal vector indicated by the normal image. Hereinafter, the position / posture after correction by the rotation correction unit 13 will be referred to as an intermediate position / posture.

すなわち、回転修正部１３は、概略位置姿勢（Ｐ^０＝［θ_ｘ ^０，θ_ｙ ^０，θ_ｚ ^０，ｔ_ｘ ^０，ｔ_ｙ ^０，ｔ_ｚ ^０］）の回転量を修正し、中間位置姿勢（Ｐ^１＝［θ_ｘ ^１，θ_ｙ ^１，θ_ｚ ^１，ｔ_ｘ ^１，ｔ_ｙ ^１，ｔ_ｚ ^１］）を得る。 That is, the rotation correction unit 13 corrects the rotation amount of the approximate position / orientation (P ⁰ = [θ _x ⁰ , θ _y ⁰ , θ _z ⁰ , t _x ⁰ , _ty ⁰ , t _z ⁰ ]) to correct the intermediate position. The posture (P ¹ = [θ _x ¹ , θ _y ¹ , θ _z ¹ , t _x ¹ , t _y ¹ , t _z ¹ ]) is obtained.

次いで、並進修正部１４は、画像上の有効な画素を判定した上で、画像取得部１１により取得された位置画像に基づいて回転修正部１３による修正後の位置姿勢（中間位置姿勢）のうちの並進量を修正する（ステップＳＴ４０４）。並進修正部１４は、中間位置姿勢で画像上に投影したワークの３次元モデルの３次元位置と位置画像が示す３次元位置とを比較することで、並進量の修正量を決定する。以下、並進修正部１４による修正後の位置姿勢を詳細位置姿勢と称す。 Next, the translation correction unit 14 determines the effective pixels on the image, and then, based on the position image acquired by the image acquisition unit 11, the translation correction unit 14 is among the position orientations (intermediate position postures) corrected by the rotation correction unit 13. The translation amount of is corrected (step ST404). The translation correction unit 14 determines the correction amount of the translation amount by comparing the three-dimensional position of the three-dimensional model of the work projected on the image in the intermediate position posture with the three-dimensional position indicated by the position image. Hereinafter, the position / posture after correction by the translation correction unit 14 will be referred to as a detailed position / posture.

すなわち、並進修正部１４は、中間位置姿勢（Ｐ^１＝［θ_ｘ ^１，θ_ｙ ^１，θ_ｚ ^１，ｔ_ｘ ^１，ｔ_ｙ ^１，ｔ_ｚ ^１］）の並進量を修正し、詳細位置姿勢（Ｐ^２＝［θ_ｘ ^２，θ_ｙ ^２，θ_ｚ ^２，ｔ_ｘ ^２，ｔ_ｙ ^２，ｔ_ｚ ^２］）を得る。 That is, the translation correction unit 14 corrects the translation amount of the intermediate position posture (P ¹ = [θ _x ¹ , θ _y ¹ , θ _z ¹ , t _x ¹ , t _y ¹ , t _z ¹ ]) and corrects the detailed position. The posture (P ² = [θ _x ² , θ _y ² , θ _z ² , t _x ² , _ty ² , t _z ² ]) is obtained.

次に、図２に示す実施の形態１における回転修正部１３の動作例について、図７を参照しながら説明する。
図２に示す実施の形態１における回転修正部１３の動作例では、図７に示すように、まず、投影部１３１は、概略推定部１２により推定された位置姿勢で画像上にワークの３次元モデルを投影して投影画像群（投影ラベル画像、投影位置画像及び投影法線画像）を生成する（ステップＳＴ７０１）。 Next, an operation example of the rotation correction unit 13 according to the first embodiment shown in FIG. 2 will be described with reference to FIG. 7.
In the operation example of the rotation correction unit 13 in the first embodiment shown in FIG. 2, first, as shown in FIG. 7, the projection unit 131 has a three-dimensional structure on the image in the position and orientation estimated by the approximate estimation unit 12. The model is projected to generate a projected image group (projected label image, projected position image, and projected normal line image) (step ST701).

ＣＡＤ（Ｃｏｍｐｕｔｅｒ−ＡｉｄｅｄＤｅｓｉｇｎ）ファイルの情報から生成されるワークの３次元モデルにおける表面上の各点は、３次元空間での位置、表面の法線方向、及び属する平面の識別情報等を保持している。投影部１３１は、まず、式（１）によって、概略推定部１２により推定された位置姿勢から座標変換行列を生成する。次に、投影部１３１は、式（３）によって、ワークの３次元モデルにおける表面上の各点について３次元空間での座標変換を行う。次に、投影部１３１は、式（４）によって、投影する画像座標を特定し、各点の各種情報を画像上に投影していく。その結果、投影部１３１は、図８に示すように、投影ラベル画像、投影位置画像及び投影法線画像を生成する。 Each point on the surface of the 3D model of the work generated from the information of the CAD (Computer-Aided Design) file holds the position in the 3D space, the normal direction of the surface, the identification information of the plane to which it belongs, and the like. ing. First, the projection unit 131 generates a coordinate transformation matrix from the position / orientation estimated by the rough estimation unit 12 according to the equation (1). Next, the projection unit 131 performs coordinate conversion in the three-dimensional space for each point on the surface in the three-dimensional model of the work by the equation (3). Next, the projection unit 131 specifies the image coordinates to be projected by the equation (4), and projects various information of each point onto the image. As a result, the projection unit 131 generates a projection label image, a projection position image, and a projection normal image, as shown in FIG.

図８Ａに示すように、投影ラベル画像は、画素データが整数の１つの値で構成され、その画素が属する平面の識別番号が格納される。画素値が負である場合、その画素はどの平面にも属していないことを示している。 As shown in FIG. 8A, in the projection label image, the pixel data is composed of one integer value, and the identification number of the plane to which the pixel belongs is stored. If the pixel value is negative, it indicates that the pixel does not belong to any plane.

投影位置画像の画素データの性質は、位置画像と同様である。図８Ｂに示す投影位置画像では、上から順に、３次元位置におけるＸ座標の画像、３次元位置におけるＹ座標の画像、３次元位置におけるＺ座標の画像を示している。 The properties of the pixel data of the projected position image are the same as those of the position image. In the projected position image shown in FIG. 8B, an image of the X coordinate at the three-dimensional position, an image of the Y coordinate at the three-dimensional position, and an image of the Z coordinate at the three-dimensional position are shown in order from the top.

投影法線画像の画素データの性質は、法線画像と同様である。図８Ｃに示す投影法線画像では、上から順に、法線ベクトルにおけるＸ軸成分の画像、法線ベクトルにおけるＹ軸成分の画像、法線ベクトルにおけるＺ軸成分の画像を示している。 The properties of the pixel data of the projected normal image are the same as those of the normal image. In the projection normal image shown in FIG. 8C, an image of the X-axis component in the normal vector, an image of the Y-axis component in the normal vector, and an image of the Z-axis component in the normal vector are shown in order from the top.

次いで、画素判定部１３２は、画像取得部１１により取得された画像群及び投影部１３１により生成された投影画像群に基づいて、投影ラベル画像のうちの有効な平面に属する画素を有効な画素として判定する（ステップＳＴ７０２）。 Next, the pixel determination unit 132 uses pixels belonging to an effective plane of the projected label image as effective pixels based on the image group acquired by the image acquisition unit 11 and the projected image group generated by the projection unit 131. Determine (step ST702).

この際、画素判定部１３２は、まず、投影ラベル画像の画素値から、各画素が属する平面を決定する。ここで、画素判定部１３２は、画素値が負の場合には、無効画素と判定する。
次に、画素判定部１３２は、位置画像が示す３次元位置と投影位置画像が示す３次元位置との差が閾値以上である画素が存在する場合、当該画素を無効画素と判定する。そして、画素判定部１３２は、投影ラベル画像における無効画素の画素値を負の値に設定する。
また、画素判定部１３２は、法線画像が示す法線ベクトルと投影法線画像が示す法線ベクトルとの内積が閾値以下である画素が存在する場合、当該画素を無効画素と判定する。そして、画素判定部１３２は、投影ラベル画像における無効画素の画素値を負の値に設定する。
そして、投影ラベル画像において、画素判定部１３２による処理で無効画素と判定されなかった画素は、有効画素となる。 At this time, the pixel determination unit 132 first determines the plane to which each pixel belongs from the pixel value of the projected label image. Here, when the pixel value is negative, the pixel determination unit 132 determines that the pixel is invalid.
Next, the pixel determination unit 132 determines that the pixel is an invalid pixel when there is a pixel in which the difference between the three-dimensional position indicated by the position image and the three-dimensional position indicated by the projected position image is equal to or greater than the threshold value. Then, the pixel determination unit 132 sets the pixel value of the invalid pixel in the projected label image to a negative value.
Further, the pixel determination unit 132 determines that the pixel is an invalid pixel when there is a pixel in which the inner product of the normal vector indicated by the normal image and the normal vector indicated by the projection normal image is equal to or less than a threshold value. Then, the pixel determination unit 132 sets the pixel value of the invalid pixel in the projected label image to a negative value.
Then, in the projected label image, the pixels that are not determined to be invalid pixels by the processing by the pixel determination unit 132 become effective pixels.

なお上記では、画素判定部１３２は、３次元位置を用いた画素判定と、法線ベクトルを用いた画素判定との両方の判定を行っている。しかしながら、画素判定部１３２は、上記の画素判定のうちの一方の判定のみを行ってもよい。 In the above, the pixel determination unit 132 makes both determinations of the pixel determination using the three-dimensional position and the pixel determination using the normal vector. However, the pixel determination unit 132 may perform only one of the above pixel determinations.

次いで、回転更新部１３３は、回転量を更新する（ステップＳＴ７０３）。回転更新部１３３は、投影法線画像が示す法線ベクトルと法線画像が示す法線ベクトルとの差から特異値分解に基づいて回転行列を生成することにより、回転量を更新する。 Next, the rotation update unit 133 updates the rotation amount (step ST703). The rotation update unit 133 updates the rotation amount by generating a rotation matrix based on the singular value decomposition from the difference between the normal vector shown by the projection normal image and the normal vector shown by the normal image.

回転更新部１３３は、概略位置姿勢による投影法線画像が示す法線ベクトルと法線画像が示す法線ベクトルとの差を評価し、その差を修正するような座標変換行列を生成し、式（２）によって、座標変換行列から中間位置姿勢を算出する。 The rotation update unit 133 evaluates the difference between the normal vector shown by the projected normal image and the normal vector shown by the normal image due to the approximate position and orientation, generates a coordinate transformation matrix that corrects the difference, and formulates an equation. According to (2), the intermediate position / orientation is calculated from the coordinate transformation matrix.

ここで、回転更新部１３３は、修正する回転量を求めるため、投影ラベル画像において非負値である全ての画素の情報を用いて、下式（５）に示す行列を求める。ここで、（ｐ_ｉ’，ｑ_ｉ’，ｒ_ｉ’）は法線画像におけるｉ番目の画素の法線ベクトル、（ｐ，ｑ，ｒ）は投影法線画像におけるｉ番目の画素の法線ベクトルを表す。

Here, in order to obtain the amount of rotation to be corrected, the rotation update unit 133 obtains the matrix shown in the following equation (5) by using the information of all the pixels having non-negative values in the projected label image. _{_{Here, (p i ', q i}} ', r i ') is the normal vector of the i-th pixel in the normal image, (p, q, r) is normal to the i th pixel in the projected normal vector image Represents a vector.

次に、回転更新部１３３は、式（５）で生成した行列に特異値分解を施して、下式（６），（７）に従い回転行列を得る。なお式（７）において、Ｒは回転行列を示す。

Next, the rotation update unit 133 performs singular value decomposition on the matrix generated by the equation (5) to obtain a rotation matrix according to the following equations (6) and (7). In equation (7), R represents a rotation matrix.

次に、回転更新部１３３は、下式（８）に示すように、式（７）で得た回転行列から、回転量の修正分による座標変換行列を生成する。なお式（８）において、ΔＭ^１０は回転量の修正分による座標変換行列を示している。そして、回転更新部１３３は、下式（９）に示すように、式（８）で生成した座標変換行列を、概略位置姿勢による座標変換行列に掛け合わせることにより、中間位置姿勢による座標変換行列を更新する。なお式（９）において、Ｍ^０は概略位置姿勢による座標変換行列を示し、Ｍ^１は中間位置姿勢による座標変換行列を示している。

Next, as shown in the following equation (8), the rotation update unit 133 generates a coordinate conversion matrix based on the correction amount of the rotation amount from the rotation matrix obtained by the equation (7). In equation (8), ΔM ¹⁰ indicates a coordinate transformation matrix based on the correction of the amount of rotation. Then, as shown in the following equation (9), the rotation update unit 133 multiplies the coordinate transformation matrix generated by the equation (8) with the coordinate transformation matrix based on the approximate position / orientation, thereby performing the coordinate transformation matrix based on the intermediate position / orientation. To update. In equation (9), M ⁰ indicates a coordinate transformation matrix based on the approximate position and orientation, and M ¹ indicates a coordinate transformation matrix based on the intermediate position orientation.

その後、回転更新部１３３は、出力となる中間位置姿勢（Ｐ^１＝［θ_ｘ ^１，θ_ｙ ^１，θ_ｚ ^１，ｔ_ｘ ^１，ｔ_ｙ ^１，ｔ_ｚ ^１］）を、中間位置姿勢による座標変換行列から式（２）によって取得する。 After that, the rotation update unit 133 determines the output intermediate position posture (P ¹ = [θ _x ¹ , θ _y ¹ , θ _z ¹ , t _x ¹ , _ty ¹ , t _z ¹ ]) according to the intermediate position posture. Obtained from the coordinate transformation matrix by equation (2).

次に、図３に示す実施の形態１における並進修正部１４の動作例について、図９を参照しながら説明する。
図３に示す実施の形態１における並進修正部１４の動作例では、図９に示すように、まず、投影部１４１は、回転修正部１３による修正後の位置姿勢で画像上にワークの３次元モデルを投影して投影画像群（投影ラベル画像、投影位置画像及び投影法線画像）を生成する（ステップＳＴ９０１）。 Next, an operation example of the translation correction unit 14 according to the first embodiment shown in FIG. 3 will be described with reference to FIG.
In the operation example of the translation correction unit 14 in the first embodiment shown in FIG. 3, as shown in FIG. 9, first, the projection unit 141 is three-dimensionally displayed on the image in the position and orientation corrected by the rotation correction unit 13. The model is projected to generate a projected image group (projected label image, projected position image, and projected normal line image) (step ST901).

次いで、画素判定部１４２は、画像取得部１１により取得された画像群及び投影部１４１により生成された投影画像群に基づいて、投影ラベル画像のうちの有効な平面に属する画素を有効な画素として判定する（ステップＳＴ９０２）。 Next, the pixel determination unit 142 sets pixels belonging to an effective plane of the projected label image as effective pixels based on the image group acquired by the image acquisition unit 11 and the projected image group generated by the projection unit 141. Determine (step ST902).

ステップＳＴ９０１における投影部１４１による処理及びステップＳＴ９０２における画素判定部１４２による処理は、用いるデータは異なるが処理自体はステップＳＴ７０１における投影部１３１による処理及びステップＳＴ７０２における画素判定部１３２による処理と同じである。 The processing by the projection unit 141 in step ST901 and the processing by the pixel determination unit 142 in step ST902 are the same as the processing by the projection unit 131 in step ST701 and the processing by the pixel determination unit 132 in step ST702, although the data used are different. ..

次いで、並進更新部１４３は、並進量を更新する（ステップＳＴ９０３）。並進更新部１４３は、位置画像から物体の各平面の方程式を算出し、投影位置画像における各画素と対応する平面との位置関係から、最小二乗法を用いて、並進量を更新する。 Next, the translation update unit 143 updates the translation amount (step ST903). The translation update unit 143 calculates the equation of each plane of the object from the position image, and updates the translation amount by using the least squares method from the positional relationship between each pixel in the projected position image and the corresponding plane.

並進更新部１４３は、中間位置姿勢による投影位置画像が示す３次元位置と位置画像が示す３次元位置との差を評価し、その差を修正するような座標変換行列を生成し、式（２）によって座標変換行列から詳細位置姿勢を算出する。 The translational update unit 143 evaluates the difference between the three-dimensional position indicated by the projected position image due to the intermediate position orientation and the three-dimensional position indicated by the position image, generates a coordinate conversion matrix that corrects the difference, and formulates (2). ) To calculate the detailed position and orientation from the coordinate conversion matrix.

ここで、今、３次元空間上のある観測点が、本来はａｘ＋ｂｙ＋ｃｚ＋ｄ＝０で表される既知の平面上の点にあるにもかかわらず、本来の座標に未知の誤差（Δｘ，Δｙ，Δｚ）を重畳した座標（ｘ，ｙ，ｚ）として観測された状況について考える。ここで、本来の観測点の座標（ｘ＋Δｘ，ｙ＋Δｙ，ｚ＋Δｚ）は、この時点ではまだ特定できないが、少なくとも平面上の点であることから、下式（１０）の条件を満たすこととなる。

Here, even though a certain observation point in the three-dimensional space is originally at a point on a known plane represented by ax + by + cz + d = 0, there is an unknown error (Δx, Δy, Δz) in the original coordinates. ) Is superposed as coordinates (x, y, z). Here, the coordinates (x + Δx, y + Δy, z + Δz) of the original observation point cannot be specified yet at this point, but at least they are points on a plane, so that the condition of the following equation (10) is satisfied.

上記の状況を、並進更新部１４３に置き換えると、既知の平面は投影ラベル画像と位置画像によって算出される各平面に相当し、観測点は投影位置画像における各画素データに相当する。 Replacing the above situation with the translation update unit 143, the known plane corresponds to each plane calculated by the projection label image and the position image, and the observation point corresponds to each pixel data in the projection position image.

ここで、投影ラベル画像における全ての有効な画素について式（１０）を適用すると、式（１１）のように複数の方程式が生成される。ここで、Ｎ_ｉは有効画素数を示し、（ｘ_ｉ，ｙ_ｉ，ｚ_ｉ）はｉ番目の画素の観測された座標を示し、ｓ（ｉ）はｉ番目の画素が属する平面番号を示し、（ａ_ｓ，ｂ_ｓ，ｃ_ｓ，ｄ_ｓ）はｓ番目の平面の係数を示し、（Δｘ，Δｙ，Δｚ）は未知の並進量誤差を示している。

Here, when the equation (10) is applied to all the valid pixels in the projected label image, a plurality of equations are generated as in the equation (11). Here, _{N i} denotes the number of effective _pixels, shows the (x _i, y i, _{z i)} denotes the observed coordinates of the i-th pixel, s (i) is a plan number i-th pixel belongs shows _{_{_{(a s, b s, c}}} s, d s) showed a coefficient of s-th plane, (Δx, Δy, Δz) is unknown translation amount error.

式（１１）を平面ごとに整理して行列表現すると式（１２）で表される。式（１２）において、Ｎ_ｓは観測されている平面数を示している。未知の並進量誤差は、最小二乗法を用いて下式（１３）により得ることができる。

When equation (11) is arranged by plane and expressed as a matrix, it is expressed by equation (12). In equation (12), N _s indicates the number of observed planes. The unknown translational amount error can be obtained by the following equation (13) using the least squares method.

下式（１４）のように、詳細位置姿勢（Ｐ^２＝［θ_ｘ ^２，θ_ｙ ^２，θ_ｚ ^２，ｔ_ｘ ^２，ｔ_ｙ ^２，ｔ_ｚ ^２］）は、中間位置姿勢（Ｐ^１＝［θ_ｘ ^１，θ_ｙ ^１，θ_ｚ ^１，ｔ_ｘ ^１，ｔ_ｙ ^１，ｔ_ｚ ^１］）の並進量を修正することにより得られる。

As shown in the following equation (14), the detailed position and orientation (P ² = [θ _x ² , θ _y ² , θ _z ² , t _x ² , _ty ² , t _z ² ]) is the intermediate position and orientation (P ^1). It is obtained by modifying the translation amount of = [θ _x ¹ , θ _y ¹ , θ _z ¹ , t _x ¹ , t _y ¹ , t _z ¹ ]).

詳細位置姿勢（Ｐ^２）は位置姿勢推定装置１における最終的な出力となる。
この詳細位置姿勢をバラ積みピッキングシステムに利用する際は、位置姿勢推定装置１は詳細位置姿勢をワークの位置姿勢としてバラ積みピッキングシステムに通知し、バラ積みピッキングシステムは、この位置姿勢に基づいてロボットハンドを起動してワークの把持を行う。 Detailed position and orientation (P ²⁾ is the final output of the position and orientation estimation apparatus 1.
When this detailed position / orientation is used in the bulk picking system, the position / orientation estimation device 1 notifies the bulk picking system of the detailed position / orientation as the position / orientation of the work, and the bulk picking system is based on this position / orientation. The robot hand is activated to grip the work.

以上のように、この実施の形態１によれば、位置姿勢推定装置１は、可視画像、当該可視画像に映された物体の３次元位置を画素毎に示す位置画像、及び、当該可視画像に映された物体の法線ベクトルを画素毎に示す法線画像を取得する画像取得部１１と、画像取得部１１により取得された可視画像に基づいて、ワークの位置姿勢を推定する概略推定部１２と、画像取得部１１による取得結果及び概略推定部１２により推定された位置姿勢に基づいて、画像上の有効な画素を判定した上で、当該位置姿勢で画像に投影したワークの３次元モデルの法線ベクトルと法線画像が示す法線ベクトルとを比較して、当該位置姿勢のうちの回転量を修正する回転修正部１３と、画像取得部１１による取得結果及び回転修正部１３による修正後の位置姿勢に基づいて、画像上の有効な画素を判定した上で、当該位置姿勢で画像に投影したワークの３次元モデルの３次元位置と位置画像が示す３次元位置とを比較して、当該位置姿勢のうちの並進量を修正する並進修正部１４とを備えた。これにより、実施の形態１に係る位置姿勢推定装置１は、従来に対し、ワークの一部が他の物体に遮られていても、ワークの位置姿勢をより精度よく推定可能となる。 As described above, according to the first embodiment, the position / orientation estimation device 1 is used for the visible image, the position image showing the three-dimensional position of the object projected on the visible image for each pixel, and the visible image. An image acquisition unit 11 that acquires a normal image showing the normal vector of the projected object for each pixel, and a rough estimation unit 12 that estimates the position and orientation of the work based on the visible image acquired by the image acquisition unit 11. After determining the effective pixels on the image based on the acquisition result by the image acquisition unit 11 and the position and orientation estimated by the rough estimation unit 12, the three-dimensional model of the work projected on the image in the position and orientation. The rotation correction unit 13 that compares the normal vector and the normal vector indicated by the normal image and corrects the amount of rotation in the position and orientation, the acquisition result by the image acquisition unit 11, and the correction by the rotation correction unit 13. After determining the effective pixels on the image based on the position and orientation of, the three-dimensional position of the three-dimensional model of the work projected on the image in the position and orientation is compared with the three-dimensional position indicated by the position image. A translation correction unit 14 for correcting the translation amount in the position and orientation is provided. As a result, the position / orientation estimation device 1 according to the first embodiment can estimate the position / orientation of the work more accurately than before, even if a part of the work is blocked by another object.

なお、本願発明はその発明の範囲内において、実施の形態の任意の構成要素の変形、若しくは実施の形態の任意の構成要素の省略が可能である。 In the present invention, within the scope of the invention, it is possible to modify any component of the embodiment or omit any component of the embodiment.

１位置姿勢推定装置
１１画像取得部
１２概略推定部
１３回転修正部
１４並進修正部
１３１投影部（第１投影部）
１３２画素判定部（第１画素判定部）
１３３回転更新部
１４１投影部（第２投影部）
１４２画素判定部（第２画素判定部）
１４３並進更新部 1 Position / orientation estimation device 11 Image acquisition unit 12 Approximate estimation unit 13 Rotation correction unit 14 Translation correction unit 131 Projection unit (first projection unit)
132 Pixel determination unit (first pixel determination unit)
133 Rotation update unit 141 Projection unit (second projection unit)
142 Pixel determination unit (second pixel determination unit)
143 Translation Update

Claims

Image acquisition to acquire a visible image, a position image showing the three-dimensional position of the object projected on the visible image for each pixel, and a normal image showing the normal vector of the object projected on the visible image for each pixel. Department and
A rough estimation unit that estimates the position and orientation of the work based on the visible image acquired by the image acquisition unit, and
After determining the effective pixels on the image based on the acquisition result by the image acquisition unit and the position and orientation estimated by the rough estimation unit, the normal of the three-dimensional model of the work projected on the image in the position and orientation. A rotation correction unit that compares the vector with the normal vector shown by the normal image and corrects the amount of rotation in the position and orientation.
After determining the effective pixels on the image based on the acquisition result by the image acquisition unit and the position and orientation after correction by the rotation correction unit, the three-dimensional model of the work projected on the image in the position and orientation is three-dimensional. A position / orientation estimation device including a translation correction unit that compares a position with a three-dimensional position indicated by a position image and corrects a translation amount in the position / orientation.

The rotation correction part
A projected label image to which identification information of the plane to which each pixel belongs is given when a three-dimensional model of the work is projected on the image with the position and orientation estimated by the rough estimation unit, and when projected on the image with the position and orientation. A projected position image showing the 3D position of the 3D model of the work for each pixel, and a projection normal image showing the normal vector of the 3D model of the work for each pixel when projected onto the image in the position and orientation. The first projection unit to be generated and
Based on the acquisition result by the image acquisition unit and the generation result by the first projection unit, the first pixel determination unit that determines the pixel belonging to the effective plane of the projection label image as an effective pixel,
Based on the determination result by the first pixel determination unit, the normal vector indicated by the projection normal image generated by the first projection unit and the normal vector indicated by the normal image acquired by the image acquisition unit The first aspect of claim 1, wherein the rotation matrix is generated from the difference based on the singular value decomposition to have a rotation update unit that updates the rotation amount of the position and orientation estimated by the rough estimation unit. Position / orientation estimation device.

The translation correction part
A projected label image to which identification information of the plane to which each pixel belongs is given when a three-dimensional model of the work is projected on the image in the position and orientation corrected by the rotation correction unit, and when projected on the image in the position and orientation. A projected position image showing the 3D position of the 3D model of the work for each pixel, and a projection normal image showing the normal vector of the 3D model of the work for each pixel when projected on the image in the position and orientation. The second projection unit to be generated and
Based on the acquisition result by the image acquisition unit and the generation result by the second projection unit, the second pixel determination unit that determines the pixel belonging to the effective plane of the projection label image as an effective pixel,
Based on the determination result by the second pixel determination unit, the equation of each plane of the object is calculated from the position image acquired by the image acquisition unit, and each pixel in the projection position image generated by the second projection unit is used. The first or second aspect of claim 1 or 2, characterized in that it has a translational update unit that updates the translation amount of the position and orientation corrected by the rotation correction unit by the least squares method from the positional relationship with the corresponding plane. Position / orientation estimation device.

Image acquisition to acquire a visible image, a position image showing the three-dimensional position of the object projected on the visible image for each pixel, and a normal image showing the normal vector of the object projected on the visible image for each pixel. Steps and
A rough estimation step for estimating the position and orientation of the work based on the visible image acquired in the image acquisition step, and a rough estimation step.
After determining the effective pixels on the image based on the acquisition result in the image acquisition step and the position / orientation estimated in the outline estimation step, the normal vector of the three-dimensional model of the work projected on the image in the position / orientation. And the normal vector shown in the normal image, and the rotation correction step to correct the rotation amount in the position and orientation, and
After determining the effective pixels on the image based on the acquisition result in the image acquisition step and the corrected position and orientation in the rotation correction step, the three-dimensional model of the work projected on the image in the position and orientation is three-dimensional. A position / orientation estimation method having a translation correction step of comparing a position with a three-dimensional position indicated by a position image and correcting a translation amount in the position / posture.