CN116245940A

CN116245940A - Category-level six-degree-of-freedom object pose estimation method based on structure difference perception

Info

Publication number: CN116245940A
Application number: CN202310052012.8A
Authority: CN
Inventors: 李嘉茂; 李国威; 朱冬晨; 张广慧; 石文君; 张晓林
Original assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Current assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Priority date: 2023-02-02
Filing date: 2023-02-02
Publication date: 2023-06-09
Anticipated expiration: 2043-02-02
Also published as: CN116245940B

Abstract

The invention relates to a category-level six-degree-of-freedom object pose estimation method based on structural difference perception, which comprises the following steps: inputting the depth map into a target detection segmentation network for recognition, obtaining an observation point cloud of an object instance according to a recognition result, and selecting a category prior corresponding to the target object based on the observation point cloud of the object instance; extracting the observation point cloud and the category priori features to obtain instance geometric features and category geometric features; inputting the instance geometric features and the category geometric features into an information interaction enhancement module to obtain enhanced instance geometric features and category geometric features; then, the semantic and geometric information are fused through the semantic dynamic fusion module, so that instance fusion characteristics and category fusion characteristics are obtained; obtaining an instance NOCS model based on the category fusion features; and matching the example NOCS model with the observation point cloud through a matching network, and calculating according to the similarity to obtain the 6D pose and the size of the target object. The method and the device can improve the accuracy of 6D pose estimation.

Description

Class-level six-degree-of-freedom object pose estimation method based on structural difference perception

技术领域technical field

本发明涉及计算机视觉技术领域，特别是涉及一种基于结构差异感知的类别级六自由度物体位姿估计方法。The invention relates to the technical field of computer vision, in particular to a category-level six-degree-of-freedom object pose estimation method based on structural difference perception.

背景技术Background technique

从图片中估计现实物体的六自由度(6DegreeofFreedom,6D)位姿是一个十分关键的任务，即估计物体在相机坐标系下的位置和朝向，由一个三维的旋转矩阵和一个三维的平移矢量组成。物体6D位姿估计任务被广泛应用在很多现实场景中，如3D场景理解、机器人抓取、虚拟现实和增强现实等领域。6D位姿估计任务按照被估计物体的级别可分为两类：1.针对特定物体的实例级6D位姿估计；2.针对同一类物体的类别级6D位姿估计。实例级6D位姿估计任务在计算物体位姿时，需要事先知道自己在世界坐标系下的位置，一般世界坐标系的中心落在物体的中心处，也即它的CAD模型。对于现实场景中没有定义CAD模型的新物体，实例级6D位姿估计算法就没办法去估计出物体的位姿，这严重的限制了实例级6D位姿估计算法在现实场景中的应用。因此，为了打破实例级6D位姿估计方法的限制，类别级6D位姿估计任务被提出，它能够估计同一类别下不同物体实例的6D位姿，即使一些物体实例没有CAD模型。Estimating the six degrees of freedom (6DegreeofFreedom, 6D) pose of a real object from a picture is a very critical task, that is, estimating the position and orientation of the object in the camera coordinate system, which consists of a three-dimensional rotation matrix and a three-dimensional translation vector. . Object 6D pose estimation tasks are widely used in many real-world scenarios, such as 3D scene understanding, robot grasping, virtual reality and augmented reality. The 6D pose estimation task can be divided into two categories according to the level of the estimated object: 1. Instance-level 6D pose estimation for a specific object; 2. Category-level 6D pose estimation for the same type of object. The instance-level 6D pose estimation task needs to know its position in the world coordinate system in advance when calculating the pose of the object. Generally, the center of the world coordinate system falls on the center of the object, that is, its CAD model. For new objects without a defined CAD model in the real scene, the instance-level 6D pose estimation algorithm cannot estimate the pose of the object, which seriously limits the application of the instance-level 6D pose estimation algorithm in real scenes. Therefore, to break the limitation of instance-level 6D pose estimation methods, the category-level 6D pose estimation task is proposed, which is able to estimate the 6D poses of different object instances in the same category even if some object instances do not have CAD models.

Wang等人首先提出类别级物体6D位姿估计任务的概念，为了解决在估计物体的6D位姿时缺乏CAD模型的问题，他们引入了一个归一化物体坐标空间(NOCS)——一个类别下所有可能的对象实例的共享规范表示，通过首先在NOCS中重建物体实例，然后计算物体实例从NOCS空间到相机坐标系的位姿变换关系，也即物体的6D位姿。由于相同类别下的不同物体实例可能具有很大的结构差异，因此，重建它们的NOCS模型是一个十分困难的任务，这是类别级物体6D位姿估计任务的难点。针对这个问题，SPD提出为每个类别学习一个类别先验，之后根据不同的物体实例对类别先验进行变形，重建出物体实例的NOCS模型，进一步增加了位姿估计的精度，但是类别先验信息模糊导致重建的NOCS模型不够精确。Wang et al. first proposed the concept of category-level object 6D pose estimation tasks. In order to solve the problem of lack of CAD models when estimating the 6D pose of objects, they introduced a normalized object coordinate space (NOCS)—a category under A shared canonical representation of all possible object instances, by first reconstructing the object instance in NOCS, and then computing the pose transformation relation of the object instance from the NOCS space to the camera coordinate system, that is, the 6D pose of the object. Since different object instances under the same category may have large structural differences, it is a very difficult task to reconstruct their NOCS models, which is the difficulty of category-level object 6D pose estimation tasks. In response to this problem, SPD proposes to learn a category prior for each category, and then deform the category prior according to different object instances to reconstruct the NOCS model of the object instance, which further increases the accuracy of pose estimation, but the category prior Fuzzy information makes the reconstructed NOCS model imprecise.

发明内容Contents of the invention

本发明所要解决的技术问题是提供一种基于结构差异感知的类别级六自由度物体位姿估计方法，能够提高6D位姿估计的准确性。The technical problem to be solved by the present invention is to provide a category-level six-degree-of-freedom object pose estimation method based on structural difference perception, which can improve the accuracy of 6D pose estimation.

本发明解决其技术问题所采用的技术方案是：提供一种基于结构差异感知的类别级六自由度物体位姿估计方法，包括以下步骤：The technical solution adopted by the present invention to solve the technical problem is to provide a category-level six-degree-of-freedom object pose estimation method based on structural difference perception, including the following steps:

将深度图输入至目标检测分割网络，得到目标物体的图像块以及目标物体的分割掩码；Input the depth map to the target detection segmentation network to obtain the image block of the target object and the segmentation mask of the target object;

根据目标物体的分割掩码和所述深度图得到物体实例的观测点云，并基于物体实例的观测点云选择目标物体对应的类别先验；Obtaining the observation point cloud of the object instance according to the segmentation mask of the target object and the depth map, and selecting the category prior corresponding to the target object based on the observation point cloud of the object instance;

提取观测点云和类别先验的特征，得到实例几何特征和类别几何特征；Extract the features of the observed point cloud and category priors, and obtain the instance geometric features and category geometric features;

将所述实例几何特征和类别几何特征输入信息交互增强模块，通过所述信息交互增强模块去隐式建模所述实例几何特征和类别几何特征之间的几何差异，并对实例几何特征和类别几何特征进行补充，得到增强的实例几何特征和类别几何特征；Input the instance geometric features and category geometric features into the information interaction enhancement module, use the information interaction enhancement module to implicitly model the geometric differences between the instance geometry features and category geometry features, and analyze the instance geometry features and categories The geometric features are supplemented, and the enhanced instance geometric features and category geometric features are obtained;

将所述实例几何特征和类别几何特征之间的几何差异、增强的实例几何特征和类别几何特征输入至语义动态融合模块，通过所述语义动态融合模块进行语义和几何信息的融合，得到实例融合特征和类别融合特征；Input the geometric difference between the instance geometric feature and the class geometric feature, the enhanced instance geometric feature and the class geometric feature to the semantic dynamic fusion module, and carry out the fusion of semantic and geometric information through the semantic dynamic fusion module to obtain instance fusion Feature and category fusion features;

将所述类别融合特征送入到变形网络得到变形场，利用所述变形场对类别先验变形得到实例NOCS模型；Sending the category fusion feature into the deformation network to obtain a deformation field, and using the deformation field to deform the category prior to obtain an instance NOCS model;

通过匹配网络将所述实例NOCS模型和观测点云进行匹配，并根据相似性计算得到目标物体的6D位姿和大小。Match the example NOCS model and the observed point cloud through the matching network, and calculate the 6D pose and size of the target object according to the similarity.

所述目标检测分割网络采用Mask-RCNN网络。The target detection segmentation network uses a Mask-RCNN network.

所述提取观测点云和类别先验的特征时采用卷积神经网络和PointNet++网络实现。Convolutional neural network and PointNet++ network are used to realize the extraction of observed point cloud and category prior features.

所述信息交互增强模块包括：全连接层，用于分别将所述实例几何特征和类别几何特征映射到相同的特征子空间；矩阵乘单元，用于将映射到相同的特征子空间的实例几何特征和类别几何特征进行矩阵乘操作，得到实例几何特征和类别几何特征之间的结构关系矩阵；归一化单元，用于将所述结构关系矩阵归一化为权重系数；加权求和单元，用于采用所述权重系数对所述结构关系矩阵中的几何投影特征进行加权求和，得到实例几何特征和类别几何特征之间的；多层感知器，用于将几何差异分别与所述实例几何特征和类别几何特征进行融合，得到增强的实例几何特征和类别几何特征。The information interaction enhancement module includes: a fully connected layer, which is used to map the instance geometric features and category geometric features to the same feature subspace; a matrix multiplication unit, which is used to map the instance geometry features to the same feature subspace. The matrix multiplication operation is performed on the feature and the category geometric feature to obtain the structural relationship matrix between the instance geometric feature and the category geometric feature; a normalization unit is used to normalize the structural relationship matrix to a weight coefficient; a weighted summation unit, It is used to weight and sum the geometric projection features in the structural relationship matrix by using the weight coefficient to obtain the relationship between the instance geometric features and the category geometric features; the multi-layer perceptron is used to compare the geometric differences with the example Geometric features and class geometric features are fused to obtain enhanced instance geometric features and class geometric features.

所述语义动态融合模块对于增强的实例几何特征采用像素级融合策略实现对应点融合模块来探索数据源之间的内在映射，得到实例融合特征，对于不同个体的增强的类别几何特征和实例几何特征，采用实例几何特征和类别几何特征之间的几何差异动态调整所述增强的实例几何特征，将调整后的增强的实例几何特征与增强的类别几何特征进行融合，得到类别融合特征。The semantic dynamic fusion module adopts a pixel-level fusion strategy for the enhanced instance geometric features to realize the corresponding point fusion module to explore the internal mapping between data sources and obtain instance fusion features. For the enhanced category geometric features and instance geometric features of different individuals , using the geometric difference between the instance geometry feature and the class geometry feature to dynamically adjust the enhanced instance geometry feature, and fusing the adjusted enhanced instance geometry feature with the enhanced category geometry feature to obtain a category fusion feature.

有益效果Beneficial effect

由于采用了上述的技术方案，本发明与现有技术相比，具有以下的优点和积极效果：本发明利用物体实例和类别先验之间的结构差异去增强类内形状信息的学习，进一步通过语义动态融合模块根据物体实例和类别先验的几何关系动态调整语义信息，之后和增强的类别先验融合去动态补充几何信息的缺失，以提高对噪声的鲁棒性。Due to the adoption of the above-mentioned technical solution, the present invention has the following advantages and positive effects compared with the prior art: the present invention uses the structural difference between the object instance and the category prior to enhance the learning of shape information within the class, further through The semantic dynamic fusion module dynamically adjusts the semantic information according to the geometric relationship between the object instance and the class prior, and then fuses with the enhanced class prior to dynamically supplement the lack of geometric information to improve the robustness to noise.

附图说明Description of drawings

图1是本发明实施方式基于结构差异感知的类别级六自由度物体位姿估计方法的流程图；1 is a flowchart of a method for estimating the pose of a category-level six-degree-of-freedom object based on structural difference perception according to an embodiment of the present invention;

图2是本发明实施方式中信息交互增强模块的示意图；2 is a schematic diagram of an information interaction enhancement module in an embodiment of the present invention;

图3是本发明实施方式中语义动态融合模块的示意图；Fig. 3 is a schematic diagram of a semantic dynamic fusion module in an embodiment of the present invention;

图4是不同物体实例的观测点云的示意图；Figure 4 is a schematic diagram of observed point clouds of different object instances;

图5是本发明实施方式与SPD方法的结果对比。Fig. 5 is a comparison of the results of the embodiment of the present invention and the SPD method.

具体实施方式Detailed ways

下面结合具体实施例，进一步阐述本发明。应理解，这些实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解，在阅读了本发明讲授的内容之后，本领域技术人员可以对本发明作各种改动或修改，这些等价形式同样落于本申请所附权利要求书所限定的范围。Below in conjunction with specific embodiment, further illustrate the present invention. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. In addition, it should be understood that after reading the content taught by the present invention, those skilled in the art may make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims of the present application.

本发明的实施方式涉及一种基于结构差异感知的类别级六自由度物体位姿估计方法，该方法利用物体实例和类别先验之间的结构差异去增强类内形状信息的学习，并通过语义动态融合模块，根据物体实例和类别先验的几何关系动态调整语义信息，之后和增强的类别先验融合去动态补充几何信息的缺失。如图1所示，包括以下步骤：The embodiment of the present invention relates to a category-level six-degree-of-freedom object pose estimation method based on structural difference perception. The dynamic fusion module dynamically adjusts the semantic information according to the geometric relationship between the object instance and the category prior, and then fuses with the enhanced category prior to dynamically supplement the lack of geometric information. As shown in Figure 1, the following steps are included:

步骤1，将深度图输入至目标检测分割网络，得到目标物体的图像块以及目标物体的分割掩码。本步骤中可以采用现有的目标检测分割网络得到目标物体的图像块以及它的分割掩码，例如可以采用Mask-RCNN网络。Step 1. Input the depth map to the target detection and segmentation network to obtain the image block of the target object and the segmentation mask of the target object. In this step, an existing target detection and segmentation network can be used to obtain the image block of the target object and its segmentation mask, for example, a Mask-RCNN network can be used.

步骤2，根据目标物体的分割掩码和所述深度图得到物体实例的观测点云，并基于物体实例的观测点云选择目标物体对应的类别先验。Step 2: Obtain the observed point cloud of the object instance according to the segmentation mask of the target object and the depth map, and select the category prior corresponding to the target object based on the observed point cloud of the object instance.

步骤3，提取观测点云和类别先验的特征，得到实例几何特征和类别几何特征。本步骤中可以分别使用卷积神经网络和PointNet++去提取图片语义特征和点云几何特征，从而得到实例几何特征和类别几何特征。Step 3, extract the features of the observation point cloud and category priors, and obtain the instance geometric features and category geometric features. In this step, the convolutional neural network and PointNet++ can be used to extract image semantic features and point cloud geometric features to obtain instance geometric features and category geometric features.

步骤4，将所述实例几何特征和类别几何特征输入信息交互增强模块，通过所述信息交互增强模块去隐式建模所述实例几何特征和类别几何特征之间的几何差异，并对实例几何特征和类别几何特征进行补充，得到增强的实例几何特征和类别几何特征。Step 4, input the instance geometric features and class geometric features into the information interaction enhancement module, use the information interaction enhancement module to implicitly model the geometric differences between the instance geometric features and the category geometric features, and analyze the instance geometry Feature and class geometric features are supplemented, and enhanced instance geometric features and class geometric features are obtained.

本步骤中的信息交互增强模块旨在学习实例点云与类别先验之间的结构关系，以帮助在特征层次上构建它们的结构差异信息。它利用结构差异的特征来补充原始的几何特征，使增强的几何特征包括实例结构的独特个体性和类别先验的一般共性。一方面，由于实例结构特性的补充性，增强的类别几何特征可以重建一个更精确的实例NOCS模型。另一方面，实例几何特征增加了类别形状的共性，从而使重建的对应矩阵更好地将观察到的点云与NOCS模型关联起来。此外，由于类别先验和同一类下的不同实例之间的几何差异是不同的，采用信息交互增强模块能够适应以前未见过的各种形状的实例，极大地增加了本实施方式的泛化。The information interaction enhancement module in this step aims to learn the structural relationship between instance point clouds and category priors to help construct their structural difference information at the feature level. It supplements the original geometric features with features of structural differences, so that the enhanced geometric features include the unique individuality of instance structures and the general commonality of class priors. On the one hand, the enhanced category geometry features can reconstruct a more accurate instance NOCS model due to the complementarity of instance structure properties. On the other hand, instance geometry features increase the commonality of category shapes, so that the reconstructed correspondence matrix better associates the observed point cloud with the NOCS model. In addition, since the class prior and the geometric differences between different instances under the same class are different, adopting the information interaction augmentation module is able to adapt to instances of various shapes that have not been seen before, greatly increasing the generalization of this embodiment .

信息交互增强模块的结构如图2所示，包括：全连接层，用于分别将所述实例几何特征和类别几何特征映射到相同的特征子空间；矩阵乘单元，用于将映射到相同的特征子空间的实例几何特征和类别几何特征进行矩阵乘操作，得到实例几何特征和类别几何特征之间的结构关系矩阵；归一化单元，用于将所述结构关系矩阵归一化为权重系数；加权求和单元，用于采用所述权重系数对所述结构关系矩阵中的几何投影特征进行加权求和，得到实例几何特征和类别几何特征之间的；多层感知器，用于将几何差异分别与所述实例几何特征和类别几何特征进行融合，得到增强的实例几何特征和类别几何特征。The structure of the information interaction enhancement module is shown in Figure 2, including: a fully connected layer, which is used to map the instance geometric features and category geometric features to the same feature subspace; a matrix multiplication unit, which is used to map to the same The instance geometric features and category geometric features of the feature subspace are matrix multiplied to obtain the structural relationship matrix between the instance geometric features and the category geometric features; the normalization unit is used to normalize the structural relationship matrix to a weight coefficient ; The weighted sum unit is used to adopt the weight coefficient to carry out weighted summation to the geometric projection features in the structure relationship matrix to obtain the relationship between the instance geometric features and the category geometric features; the multi-layer perceptron is used to combine the geometric features The difference is fused with the instance geometric features and category geometric features respectively to obtain enhanced instance geometric features and category geometric features.

如此对于实例几何特征和类别几何特征，可以使用全连接网络层把它们映射到相同的特征子空间，然后通过矩阵乘操作获得它们的结构关系矩阵。然后把结构关系矩阵归一化为权重系数，对几何投影特征进行加权求和获得结构差异特征。最终，利用多层感知器将原始几何特征与结构差异特征融合得到增强的几何特征。In this way, for instance geometric features and category geometric features, they can be mapped to the same feature subspace using a fully connected network layer, and then their structural relationship matrices can be obtained through matrix multiplication operations. Then, the structural relationship matrix is normalized to weight coefficients, and the geometric projection features are weighted and summed to obtain structural difference features. Finally, the multi-layer perceptron is used to fuse the original geometric features and structural difference features to obtain enhanced geometric features.

步骤5，将所述实例几何特征和类别几何特征之间的几何差异、增强的实例几何特征和类别几何特征输入至语义动态融合模块，通过所述语义动态融合模块进行语义和几何信息的融合，得到实例融合特征和类别融合特征。Step 5, inputting the geometric difference between the instance geometric feature and the class geometric feature, the enhanced instance geometric feature and the class geometric feature to the semantic dynamic fusion module, and performing the fusion of semantic and geometric information through the semantic dynamic fusion module, Get instance fusion features and category fusion features.

如图4所示，经过目标检测和分割模型获得的物体实例点云，有可能包含一定的噪声点。当这些噪声点的影响传递到类别先验时，理论上会对NOCS模型的重建精度产生负面影响，导致物体实例点云和它的NOCS模型的对应关系出现偏差。为了解决这个问题，本实施方式设计了一个语义动态融合模块，通过充分的融合几何和语义信息，提高网络对噪声点的鲁棒性。As shown in Figure 4, the object instance point cloud obtained through the target detection and segmentation model may contain certain noise points. When the influence of these noise points is transmitted to the category prior, it will theoretically have a negative impact on the reconstruction accuracy of the NOCS model, resulting in a deviation in the correspondence between the object instance point cloud and its NOCS model. In order to solve this problem, this embodiment designs a semantic dynamic fusion module, which improves the robustness of the network to noise points by fully fusing geometric and semantic information.

图3展示了语义动态融合模块，该语义动态融合模块对于增强的实例几何特征采用像素级融合策略实现对应点融合模块来探索数据源之间的内在映射，得到实例融合特征，对于不同个体的增强的类别几何特征和实例几何特征，采用实例几何特征和类别几何特征之间的几何差异动态调整所述增强的实例几何特征，将调整后的增强的实例几何特征与增强的类别几何特征进行融合，得到类别融合特征。也就是说，本实施方式借鉴了DenseFusion中的方法，采用一个像素级融合策略实现了对应点融合模块来探索数据源之间的内在映射。对于来自不同个体的类别几何特征和实例语义特征，由于它们不存在像素级的对应关系，因此像素级融合策略不能被直接使用，于是本实施方式采用两种不同的融合策略。第一种是特征融合的一般思想，将两者拼接起来然后通过MLP函数进行融合，这种方式称它为直接融合。虽然直接融合策略可以通过吸收语义信息提高性能，但对于跨个体问题仍然考虑不足。为此，本实施方式还设计了语义融合策略，根据实例和类别的结构关系矩阵动态调整实例语义特征，然后和类别几何特征进行融合。Figure 3 shows the semantic dynamic fusion module. The semantic dynamic fusion module adopts the pixel-level fusion strategy to realize the corresponding point fusion module for the enhanced instance geometric features to explore the internal mapping between data sources and obtain instance fusion features. For the enhancement of different individuals category geometric features and instance geometric features, using the geometric difference between the instance geometric features and category geometric features to dynamically adjust the enhanced instance geometric features, and fusing the adjusted enhanced instance geometric features with the enhanced category geometric features, Get category fusion features. That is to say, this implementation mode uses the method in DenseFusion for reference, and uses a pixel-level fusion strategy to realize the corresponding point fusion module to explore the internal mapping between data sources. For category geometric features and instance semantic features from different individuals, since there is no pixel-level correspondence between them, the pixel-level fusion strategy cannot be used directly, so this embodiment adopts two different fusion strategies. The first is the general idea of feature fusion, splicing the two together and then fusing them through the MLP function, which is called direct fusion. Although the direct fusion strategy can improve performance by absorbing semantic information, it is still insufficiently considered for cross-individual problems. For this reason, this embodiment also designs a semantic fusion strategy, dynamically adjusts the semantic features of the instance according to the structural relationship matrix of the instance and the category, and then fuses it with the geometric features of the category.

步骤6，将所述类别融合特征送入到变形网络得到变形场，利用所述变形场对类别先验变形得到实例NOCS模型。Step 6: Send the category fusion features into the deformation network to obtain a deformation field, and use the deformation field to deform the category prior to obtain an instance NOCS model.

步骤7，通过匹配网络将所述实例NOCS模型和观测点云进行匹配，并根据相似性计算得到目标物体的6D位姿和大小。Step 7: Match the instance NOCS model with the observed point cloud through the matching network, and calculate the 6D pose and size of the target object according to the similarity.

如图5所示，其中目标物体外的两组框线一组为真值，另一组为预测结果。与SPD方法相比，可以看出本实施方式的方法估计的位姿更加的准确，特别是相机(图中箭头指向的物体)这种形状变化比较大的类别，本实施方式的方法的估计结果要比SPD方法的估计结果好很多，这也证明了本实施方式的方法能够很好的处理类内形状变化问题。As shown in Figure 5, one of the two sets of frame lines outside the target object is the true value, and the other is the predicted result. Compared with the SPD method, it can be seen that the pose estimated by the method of this embodiment is more accurate, especially for categories such as cameras (objects pointed by arrows in the figure) with relatively large shape changes, the estimation results of the method of this embodiment are It is much better than the estimation result of the SPD method, which also proves that the method of this embodiment can well deal with the problem of intra-class shape variation.

不难发现，本发明利用物体实例和类别先验之间的结构差异去增强类内形状信息的学习，进一步通过语义动态融合模块根据物体实例和类别先验的几何关系动态调整语义信息，之后和增强的类别先验融合去动态补充几何信息的缺失，以提高对噪声的鲁棒性。It is not difficult to find that the present invention utilizes the structural difference between the object instance and the category prior to enhance the learning of shape information within the class, and further dynamically adjusts the semantic information according to the geometric relationship between the object instance and the category prior through the semantic dynamic fusion module, and then and The enhanced class prior fusion dynamically complements the lack of geometric information to improve robustness to noise.

Claims

1. The category-level six-degree-of-freedom object pose estimation method based on structural difference perception is characterized by comprising the following steps of:

inputting the depth map into a target detection segmentation network to obtain an image block of a target object and a segmentation mask of the target object; obtaining an observation point cloud of an object instance according to the segmentation mask of the target object and the depth map, and selecting a category prior corresponding to the target object based on the observation point cloud of the object instance;

extracting the observation point cloud and the category priori features to obtain instance geometric features and category geometric features;

inputting the instance geometric features and the category geometric features into an information interaction enhancement module, implicitly modeling geometric differences between the instance geometric features and the category geometric features through the information interaction enhancement module, and supplementing the instance geometric features and the category geometric features to obtain enhanced instance geometric features and category geometric features;

inputting the geometric difference between the instance geometric features and the category geometric features, the enhanced instance geometric features and the category geometric features into a semantic dynamic fusion module, and fusing semantic and geometric information through the semantic dynamic fusion module to obtain instance fusion features and category fusion features;

the category fusion features are sent to a deformation network to obtain a deformation field, and the category prior is deformed by using the deformation field to obtain an instance NOCS model;

and matching the example NOCS model with the observation point cloud through a matching network, and calculating according to the similarity to obtain the 6D pose and the size of the target object.

2. The method for estimating the pose of the class-level six-degree-of-freedom object based on the structural difference perception according to claim 1, wherein the object detection segmentation network adopts a Mask-RCNN network.

3. The method for estimating the pose of the category-level six-degree-of-freedom object based on the structural difference perception according to claim 1, wherein the feature of the observation point cloud and the category prior is extracted by adopting a convolutional neural network and a PointNet++ network.

4. The method for estimating the pose of the category-level six-degree-of-freedom object based on the structural difference perception according to claim 1, wherein the information interaction enhancement module comprises: a full connection layer for mapping the instance geometric features and the category geometric features to the same feature subspace, respectively; the matrix multiplication unit is used for carrying out matrix multiplication operation on the instance geometric features and the category geometric features mapped to the same feature subspace to obtain a structural relation matrix between the instance geometric features and the category geometric features; the normalization unit is used for normalizing the structural relation matrix into a weight coefficient; the weighted summation unit is used for carrying out weighted summation on the geometric projection features in the structural relation matrix by adopting the weight coefficient to obtain the example geometric features and the category geometric features; and the multi-layer perceptron is used for respectively fusing the geometric difference with the instance geometric feature and the category geometric feature to obtain the enhanced instance geometric feature and the category geometric feature.

5. The method for estimating the pose of the class-level six-degree-of-freedom object based on the structural difference perception according to claim 1, wherein the semantic dynamic fusion module adopts a pixel-level fusion strategy to realize a corresponding point fusion module for the enhanced instance geometric feature to explore internal mapping between data sources, so as to obtain an instance fusion feature, adopts geometric differences between the instance geometric feature and the class geometric feature to dynamically adjust the enhanced instance geometric feature for the enhanced class geometric feature and the instance geometric feature of different individuals, and fuses the adjusted enhanced instance geometric feature and the enhanced class geometric feature to obtain the class fusion feature.