CN116245940A - Category-level six-degree-of-freedom object pose estimation method based on structure difference perception - Google Patents
Category-level six-degree-of-freedom object pose estimation method based on structure difference perception Download PDFInfo
- Publication number
- CN116245940A CN116245940A CN202310052012.8A CN202310052012A CN116245940A CN 116245940 A CN116245940 A CN 116245940A CN 202310052012 A CN202310052012 A CN 202310052012A CN 116245940 A CN116245940 A CN 116245940A
- Authority
- CN
- China
- Prior art keywords
- category
- instance
- geometric
- features
- geometric features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000008447 perception Effects 0.000 title claims abstract description 12
- 230000004927 fusion Effects 0.000 claims abstract description 51
- 230000011218 segmentation Effects 0.000 claims abstract description 15
- 230000003993 interaction Effects 0.000 claims abstract description 13
- 238000001514 detection method Methods 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 3
- 239000013589 supplement Substances 0.000 description 3
- 230000003416 augmentation Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及计算机视觉技术领域,特别是涉及一种基于结构差异感知的类别级六自由度物体位姿估计方法。The invention relates to the technical field of computer vision, in particular to a category-level six-degree-of-freedom object pose estimation method based on structural difference perception.
背景技术Background technique
从图片中估计现实物体的六自由度(6DegreeofFreedom,6D)位姿是一个十分关键的任务,即估计物体在相机坐标系下的位置和朝向,由一个三维的旋转矩阵和一个三维的平移矢量组成。物体6D位姿估计任务被广泛应用在很多现实场景中,如3D场景理解、机器人抓取、虚拟现实和增强现实等领域。6D位姿估计任务按照被估计物体的级别可分为两类:1.针对特定物体的实例级6D位姿估计;2.针对同一类物体的类别级6D位姿估计。实例级6D位姿估计任务在计算物体位姿时,需要事先知道自己在世界坐标系下的位置,一般世界坐标系的中心落在物体的中心处,也即它的CAD模型。对于现实场景中没有定义CAD模型的新物体,实例级6D位姿估计算法就没办法去估计出物体的位姿,这严重的限制了实例级6D位姿估计算法在现实场景中的应用。因此,为了打破实例级6D位姿估计方法的限制,类别级6D位姿估计任务被提出,它能够估计同一类别下不同物体实例的6D位姿,即使一些物体实例没有CAD模型。Estimating the six degrees of freedom (6DegreeofFreedom, 6D) pose of a real object from a picture is a very critical task, that is, estimating the position and orientation of the object in the camera coordinate system, which consists of a three-dimensional rotation matrix and a three-dimensional translation vector. . Object 6D pose estimation tasks are widely used in many real-world scenarios, such as 3D scene understanding, robot grasping, virtual reality and augmented reality. The 6D pose estimation task can be divided into two categories according to the level of the estimated object: 1. Instance-level 6D pose estimation for a specific object; 2. Category-level 6D pose estimation for the same type of object. The instance-level 6D pose estimation task needs to know its position in the world coordinate system in advance when calculating the pose of the object. Generally, the center of the world coordinate system falls on the center of the object, that is, its CAD model. For new objects without a defined CAD model in the real scene, the instance-level 6D pose estimation algorithm cannot estimate the pose of the object, which seriously limits the application of the instance-level 6D pose estimation algorithm in real scenes. Therefore, to break the limitation of instance-level 6D pose estimation methods, the category-level 6D pose estimation task is proposed, which is able to estimate the 6D poses of different object instances in the same category even if some object instances do not have CAD models.
Wang等人首先提出类别级物体6D位姿估计任务的概念,为了解决在估计物体的6D位姿时缺乏CAD模型的问题,他们引入了一个归一化物体坐标空间(NOCS)——一个类别下所有可能的对象实例的共享规范表示,通过首先在NOCS中重建物体实例,然后计算物体实例从NOCS空间到相机坐标系的位姿变换关系,也即物体的6D位姿。由于相同类别下的不同物体实例可能具有很大的结构差异,因此,重建它们的NOCS模型是一个十分困难的任务,这是类别级物体6D位姿估计任务的难点。针对这个问题,SPD提出为每个类别学习一个类别先验,之后根据不同的物体实例对类别先验进行变形,重建出物体实例的NOCS模型,进一步增加了位姿估计的精度,但是类别先验信息模糊导致重建的NOCS模型不够精确。Wang et al. first proposed the concept of category-level object 6D pose estimation tasks. In order to solve the problem of lack of CAD models when estimating the 6D pose of objects, they introduced a normalized object coordinate space (NOCS)—a category under A shared canonical representation of all possible object instances, by first reconstructing the object instance in NOCS, and then computing the pose transformation relation of the object instance from the NOCS space to the camera coordinate system, that is, the 6D pose of the object. Since different object instances under the same category may have large structural differences, it is a very difficult task to reconstruct their NOCS models, which is the difficulty of category-level object 6D pose estimation tasks. In response to this problem, SPD proposes to learn a category prior for each category, and then deform the category prior according to different object instances to reconstruct the NOCS model of the object instance, which further increases the accuracy of pose estimation, but the category prior Fuzzy information makes the reconstructed NOCS model imprecise.
发明内容Contents of the invention
本发明所要解决的技术问题是提供一种基于结构差异感知的类别级六自由度物体位姿估计方法,能够提高6D位姿估计的准确性。The technical problem to be solved by the present invention is to provide a category-level six-degree-of-freedom object pose estimation method based on structural difference perception, which can improve the accuracy of 6D pose estimation.
本发明解决其技术问题所采用的技术方案是:提供一种基于结构差异感知的类别级六自由度物体位姿估计方法,包括以下步骤:The technical solution adopted by the present invention to solve the technical problem is to provide a category-level six-degree-of-freedom object pose estimation method based on structural difference perception, including the following steps:
将深度图输入至目标检测分割网络,得到目标物体的图像块以及目标物体的分割掩码;Input the depth map to the target detection segmentation network to obtain the image block of the target object and the segmentation mask of the target object;
根据目标物体的分割掩码和所述深度图得到物体实例的观测点云,并基于物体实例的观测点云选择目标物体对应的类别先验;Obtaining the observation point cloud of the object instance according to the segmentation mask of the target object and the depth map, and selecting the category prior corresponding to the target object based on the observation point cloud of the object instance;
提取观测点云和类别先验的特征,得到实例几何特征和类别几何特征;Extract the features of the observed point cloud and category priors, and obtain the instance geometric features and category geometric features;
将所述实例几何特征和类别几何特征输入信息交互增强模块,通过所述信息交互增强模块去隐式建模所述实例几何特征和类别几何特征之间的几何差异,并对实例几何特征和类别几何特征进行补充,得到增强的实例几何特征和类别几何特征;Input the instance geometric features and category geometric features into the information interaction enhancement module, use the information interaction enhancement module to implicitly model the geometric differences between the instance geometry features and category geometry features, and analyze the instance geometry features and categories The geometric features are supplemented, and the enhanced instance geometric features and category geometric features are obtained;
将所述实例几何特征和类别几何特征之间的几何差异、增强的实例几何特征和类别几何特征输入至语义动态融合模块,通过所述语义动态融合模块进行语义和几何信息的融合,得到实例融合特征和类别融合特征;Input the geometric difference between the instance geometric feature and the class geometric feature, the enhanced instance geometric feature and the class geometric feature to the semantic dynamic fusion module, and carry out the fusion of semantic and geometric information through the semantic dynamic fusion module to obtain instance fusion Feature and category fusion features;
将所述类别融合特征送入到变形网络得到变形场,利用所述变形场对类别先验变形得到实例NOCS模型;Sending the category fusion feature into the deformation network to obtain a deformation field, and using the deformation field to deform the category prior to obtain an instance NOCS model;
通过匹配网络将所述实例NOCS模型和观测点云进行匹配,并根据相似性计算得到目标物体的6D位姿和大小。Match the example NOCS model and the observed point cloud through the matching network, and calculate the 6D pose and size of the target object according to the similarity.
所述目标检测分割网络采用Mask-RCNN网络。The target detection segmentation network uses a Mask-RCNN network.
所述提取观测点云和类别先验的特征时采用卷积神经网络和PointNet++网络实现。Convolutional neural network and PointNet++ network are used to realize the extraction of observed point cloud and category prior features.
所述信息交互增强模块包括:全连接层,用于分别将所述实例几何特征和类别几何特征映射到相同的特征子空间;矩阵乘单元,用于将映射到相同的特征子空间的实例几何特征和类别几何特征进行矩阵乘操作,得到实例几何特征和类别几何特征之间的结构关系矩阵;归一化单元,用于将所述结构关系矩阵归一化为权重系数;加权求和单元,用于采用所述权重系数对所述结构关系矩阵中的几何投影特征进行加权求和,得到实例几何特征和类别几何特征之间的;多层感知器,用于将几何差异分别与所述实例几何特征和类别几何特征进行融合,得到增强的实例几何特征和类别几何特征。The information interaction enhancement module includes: a fully connected layer, which is used to map the instance geometric features and category geometric features to the same feature subspace; a matrix multiplication unit, which is used to map the instance geometry features to the same feature subspace. The matrix multiplication operation is performed on the feature and the category geometric feature to obtain the structural relationship matrix between the instance geometric feature and the category geometric feature; a normalization unit is used to normalize the structural relationship matrix to a weight coefficient; a weighted summation unit, It is used to weight and sum the geometric projection features in the structural relationship matrix by using the weight coefficient to obtain the relationship between the instance geometric features and the category geometric features; the multi-layer perceptron is used to compare the geometric differences with the example Geometric features and class geometric features are fused to obtain enhanced instance geometric features and class geometric features.
所述语义动态融合模块对于增强的实例几何特征采用像素级融合策略实现对应点融合模块来探索数据源之间的内在映射,得到实例融合特征,对于不同个体的增强的类别几何特征和实例几何特征,采用实例几何特征和类别几何特征之间的几何差异动态调整所述增强的实例几何特征,将调整后的增强的实例几何特征与增强的类别几何特征进行融合,得到类别融合特征。The semantic dynamic fusion module adopts a pixel-level fusion strategy for the enhanced instance geometric features to realize the corresponding point fusion module to explore the internal mapping between data sources and obtain instance fusion features. For the enhanced category geometric features and instance geometric features of different individuals , using the geometric difference between the instance geometry feature and the class geometry feature to dynamically adjust the enhanced instance geometry feature, and fusing the adjusted enhanced instance geometry feature with the enhanced category geometry feature to obtain a category fusion feature.
有益效果Beneficial effect
由于采用了上述的技术方案,本发明与现有技术相比,具有以下的优点和积极效果:本发明利用物体实例和类别先验之间的结构差异去增强类内形状信息的学习,进一步通过语义动态融合模块根据物体实例和类别先验的几何关系动态调整语义信息,之后和增强的类别先验融合去动态补充几何信息的缺失,以提高对噪声的鲁棒性。Due to the adoption of the above-mentioned technical solution, the present invention has the following advantages and positive effects compared with the prior art: the present invention uses the structural difference between the object instance and the category prior to enhance the learning of shape information within the class, further through The semantic dynamic fusion module dynamically adjusts the semantic information according to the geometric relationship between the object instance and the class prior, and then fuses with the enhanced class prior to dynamically supplement the lack of geometric information to improve the robustness to noise.
附图说明Description of drawings
图1是本发明实施方式基于结构差异感知的类别级六自由度物体位姿估计方法的流程图;1 is a flowchart of a method for estimating the pose of a category-level six-degree-of-freedom object based on structural difference perception according to an embodiment of the present invention;
图2是本发明实施方式中信息交互增强模块的示意图;2 is a schematic diagram of an information interaction enhancement module in an embodiment of the present invention;
图3是本发明实施方式中语义动态融合模块的示意图;Fig. 3 is a schematic diagram of a semantic dynamic fusion module in an embodiment of the present invention;
图4是不同物体实例的观测点云的示意图;Figure 4 is a schematic diagram of observed point clouds of different object instances;
图5是本发明实施方式与SPD方法的结果对比。Fig. 5 is a comparison of the results of the embodiment of the present invention and the SPD method.
具体实施方式Detailed ways
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解,在阅读了本发明讲授的内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。Below in conjunction with specific embodiment, further illustrate the present invention. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. In addition, it should be understood that after reading the content taught by the present invention, those skilled in the art may make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims of the present application.
本发明的实施方式涉及一种基于结构差异感知的类别级六自由度物体位姿估计方法,该方法利用物体实例和类别先验之间的结构差异去增强类内形状信息的学习,并通过语义动态融合模块,根据物体实例和类别先验的几何关系动态调整语义信息,之后和增强的类别先验融合去动态补充几何信息的缺失。如图1所示,包括以下步骤:The embodiment of the present invention relates to a category-level six-degree-of-freedom object pose estimation method based on structural difference perception. The dynamic fusion module dynamically adjusts the semantic information according to the geometric relationship between the object instance and the category prior, and then fuses with the enhanced category prior to dynamically supplement the lack of geometric information. As shown in Figure 1, the following steps are included:
步骤1,将深度图输入至目标检测分割网络,得到目标物体的图像块以及目标物体的分割掩码。本步骤中可以采用现有的目标检测分割网络得到目标物体的图像块以及它的分割掩码,例如可以采用Mask-RCNN网络。
步骤2,根据目标物体的分割掩码和所述深度图得到物体实例的观测点云,并基于物体实例的观测点云选择目标物体对应的类别先验。Step 2: Obtain the observed point cloud of the object instance according to the segmentation mask of the target object and the depth map, and select the category prior corresponding to the target object based on the observed point cloud of the object instance.
步骤3,提取观测点云和类别先验的特征,得到实例几何特征和类别几何特征。本步骤中可以分别使用卷积神经网络和PointNet++去提取图片语义特征和点云几何特征,从而得到实例几何特征和类别几何特征。Step 3, extract the features of the observation point cloud and category priors, and obtain the instance geometric features and category geometric features. In this step, the convolutional neural network and PointNet++ can be used to extract image semantic features and point cloud geometric features to obtain instance geometric features and category geometric features.
步骤4,将所述实例几何特征和类别几何特征输入信息交互增强模块,通过所述信息交互增强模块去隐式建模所述实例几何特征和类别几何特征之间的几何差异,并对实例几何特征和类别几何特征进行补充,得到增强的实例几何特征和类别几何特征。Step 4, input the instance geometric features and class geometric features into the information interaction enhancement module, use the information interaction enhancement module to implicitly model the geometric differences between the instance geometric features and the category geometric features, and analyze the instance geometry Feature and class geometric features are supplemented, and enhanced instance geometric features and class geometric features are obtained.
本步骤中的信息交互增强模块旨在学习实例点云与类别先验之间的结构关系,以帮助在特征层次上构建它们的结构差异信息。它利用结构差异的特征来补充原始的几何特征,使增强的几何特征包括实例结构的独特个体性和类别先验的一般共性。一方面,由于实例结构特性的补充性,增强的类别几何特征可以重建一个更精确的实例NOCS模型。另一方面,实例几何特征增加了类别形状的共性,从而使重建的对应矩阵更好地将观察到的点云与NOCS模型关联起来。此外,由于类别先验和同一类下的不同实例之间的几何差异是不同的,采用信息交互增强模块能够适应以前未见过的各种形状的实例,极大地增加了本实施方式的泛化。The information interaction enhancement module in this step aims to learn the structural relationship between instance point clouds and category priors to help construct their structural difference information at the feature level. It supplements the original geometric features with features of structural differences, so that the enhanced geometric features include the unique individuality of instance structures and the general commonality of class priors. On the one hand, the enhanced category geometry features can reconstruct a more accurate instance NOCS model due to the complementarity of instance structure properties. On the other hand, instance geometry features increase the commonality of category shapes, so that the reconstructed correspondence matrix better associates the observed point cloud with the NOCS model. In addition, since the class prior and the geometric differences between different instances under the same class are different, adopting the information interaction augmentation module is able to adapt to instances of various shapes that have not been seen before, greatly increasing the generalization of this embodiment .
信息交互增强模块的结构如图2所示,包括:全连接层,用于分别将所述实例几何特征和类别几何特征映射到相同的特征子空间;矩阵乘单元,用于将映射到相同的特征子空间的实例几何特征和类别几何特征进行矩阵乘操作,得到实例几何特征和类别几何特征之间的结构关系矩阵;归一化单元,用于将所述结构关系矩阵归一化为权重系数;加权求和单元,用于采用所述权重系数对所述结构关系矩阵中的几何投影特征进行加权求和,得到实例几何特征和类别几何特征之间的;多层感知器,用于将几何差异分别与所述实例几何特征和类别几何特征进行融合,得到增强的实例几何特征和类别几何特征。The structure of the information interaction enhancement module is shown in Figure 2, including: a fully connected layer, which is used to map the instance geometric features and category geometric features to the same feature subspace; a matrix multiplication unit, which is used to map to the same The instance geometric features and category geometric features of the feature subspace are matrix multiplied to obtain the structural relationship matrix between the instance geometric features and the category geometric features; the normalization unit is used to normalize the structural relationship matrix to a weight coefficient ; The weighted sum unit is used to adopt the weight coefficient to carry out weighted summation to the geometric projection features in the structure relationship matrix to obtain the relationship between the instance geometric features and the category geometric features; the multi-layer perceptron is used to combine the geometric features The difference is fused with the instance geometric features and category geometric features respectively to obtain enhanced instance geometric features and category geometric features.
如此对于实例几何特征和类别几何特征,可以使用全连接网络层把它们映射到相同的特征子空间,然后通过矩阵乘操作获得它们的结构关系矩阵。然后把结构关系矩阵归一化为权重系数,对几何投影特征进行加权求和获得结构差异特征。最终,利用多层感知器将原始几何特征与结构差异特征融合得到增强的几何特征。In this way, for instance geometric features and category geometric features, they can be mapped to the same feature subspace using a fully connected network layer, and then their structural relationship matrices can be obtained through matrix multiplication operations. Then, the structural relationship matrix is normalized to weight coefficients, and the geometric projection features are weighted and summed to obtain structural difference features. Finally, the multi-layer perceptron is used to fuse the original geometric features and structural difference features to obtain enhanced geometric features.
步骤5,将所述实例几何特征和类别几何特征之间的几何差异、增强的实例几何特征和类别几何特征输入至语义动态融合模块,通过所述语义动态融合模块进行语义和几何信息的融合,得到实例融合特征和类别融合特征。Step 5, inputting the geometric difference between the instance geometric feature and the class geometric feature, the enhanced instance geometric feature and the class geometric feature to the semantic dynamic fusion module, and performing the fusion of semantic and geometric information through the semantic dynamic fusion module, Get instance fusion features and category fusion features.
如图4所示,经过目标检测和分割模型获得的物体实例点云,有可能包含一定的噪声点。当这些噪声点的影响传递到类别先验时,理论上会对NOCS模型的重建精度产生负面影响,导致物体实例点云和它的NOCS模型的对应关系出现偏差。为了解决这个问题,本实施方式设计了一个语义动态融合模块,通过充分的融合几何和语义信息,提高网络对噪声点的鲁棒性。As shown in Figure 4, the object instance point cloud obtained through the target detection and segmentation model may contain certain noise points. When the influence of these noise points is transmitted to the category prior, it will theoretically have a negative impact on the reconstruction accuracy of the NOCS model, resulting in a deviation in the correspondence between the object instance point cloud and its NOCS model. In order to solve this problem, this embodiment designs a semantic dynamic fusion module, which improves the robustness of the network to noise points by fully fusing geometric and semantic information.
图3展示了语义动态融合模块,该语义动态融合模块对于增强的实例几何特征采用像素级融合策略实现对应点融合模块来探索数据源之间的内在映射,得到实例融合特征,对于不同个体的增强的类别几何特征和实例几何特征,采用实例几何特征和类别几何特征之间的几何差异动态调整所述增强的实例几何特征,将调整后的增强的实例几何特征与增强的类别几何特征进行融合,得到类别融合特征。也就是说,本实施方式借鉴了DenseFusion中的方法,采用一个像素级融合策略实现了对应点融合模块来探索数据源之间的内在映射。对于来自不同个体的类别几何特征和实例语义特征,由于它们不存在像素级的对应关系,因此像素级融合策略不能被直接使用,于是本实施方式采用两种不同的融合策略。第一种是特征融合的一般思想,将两者拼接起来然后通过MLP函数进行融合,这种方式称它为直接融合。虽然直接融合策略可以通过吸收语义信息提高性能,但对于跨个体问题仍然考虑不足。为此,本实施方式还设计了语义融合策略,根据实例和类别的结构关系矩阵动态调整实例语义特征,然后和类别几何特征进行融合。Figure 3 shows the semantic dynamic fusion module. The semantic dynamic fusion module adopts the pixel-level fusion strategy to realize the corresponding point fusion module for the enhanced instance geometric features to explore the internal mapping between data sources and obtain instance fusion features. For the enhancement of different individuals category geometric features and instance geometric features, using the geometric difference between the instance geometric features and category geometric features to dynamically adjust the enhanced instance geometric features, and fusing the adjusted enhanced instance geometric features with the enhanced category geometric features, Get category fusion features. That is to say, this implementation mode uses the method in DenseFusion for reference, and uses a pixel-level fusion strategy to realize the corresponding point fusion module to explore the internal mapping between data sources. For category geometric features and instance semantic features from different individuals, since there is no pixel-level correspondence between them, the pixel-level fusion strategy cannot be used directly, so this embodiment adopts two different fusion strategies. The first is the general idea of feature fusion, splicing the two together and then fusing them through the MLP function, which is called direct fusion. Although the direct fusion strategy can improve performance by absorbing semantic information, it is still insufficiently considered for cross-individual problems. For this reason, this embodiment also designs a semantic fusion strategy, dynamically adjusts the semantic features of the instance according to the structural relationship matrix of the instance and the category, and then fuses it with the geometric features of the category.
步骤6,将所述类别融合特征送入到变形网络得到变形场,利用所述变形场对类别先验变形得到实例NOCS模型。Step 6: Send the category fusion features into the deformation network to obtain a deformation field, and use the deformation field to deform the category prior to obtain an instance NOCS model.
步骤7,通过匹配网络将所述实例NOCS模型和观测点云进行匹配,并根据相似性计算得到目标物体的6D位姿和大小。Step 7: Match the instance NOCS model with the observed point cloud through the matching network, and calculate the 6D pose and size of the target object according to the similarity.
如图5所示,其中目标物体外的两组框线一组为真值,另一组为预测结果。与SPD方法相比,可以看出本实施方式的方法估计的位姿更加的准确,特别是相机(图中箭头指向的物体)这种形状变化比较大的类别,本实施方式的方法的估计结果要比SPD方法的估计结果好很多,这也证明了本实施方式的方法能够很好的处理类内形状变化问题。As shown in Figure 5, one of the two sets of frame lines outside the target object is the true value, and the other is the predicted result. Compared with the SPD method, it can be seen that the pose estimated by the method of this embodiment is more accurate, especially for categories such as cameras (objects pointed by arrows in the figure) with relatively large shape changes, the estimation results of the method of this embodiment are It is much better than the estimation result of the SPD method, which also proves that the method of this embodiment can well deal with the problem of intra-class shape variation.
不难发现,本发明利用物体实例和类别先验之间的结构差异去增强类内形状信息的学习,进一步通过语义动态融合模块根据物体实例和类别先验的几何关系动态调整语义信息,之后和增强的类别先验融合去动态补充几何信息的缺失,以提高对噪声的鲁棒性。It is not difficult to find that the present invention utilizes the structural difference between the object instance and the category prior to enhance the learning of shape information within the class, and further dynamically adjusts the semantic information according to the geometric relationship between the object instance and the category prior through the semantic dynamic fusion module, and then and The enhanced class prior fusion dynamically complements the lack of geometric information to improve robustness to noise.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310052012.8A CN116245940B (en) | 2023-02-02 | 2023-02-02 | Category-level six-degree-of-freedom object pose estimation method based on structural difference perception |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310052012.8A CN116245940B (en) | 2023-02-02 | 2023-02-02 | Category-level six-degree-of-freedom object pose estimation method based on structural difference perception |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116245940A true CN116245940A (en) | 2023-06-09 |
CN116245940B CN116245940B (en) | 2024-04-05 |
Family
ID=86634232
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310052012.8A Active CN116245940B (en) | 2023-02-02 | 2023-02-02 | Category-level six-degree-of-freedom object pose estimation method based on structural difference perception |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116245940B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116958958A (en) * | 2023-07-31 | 2023-10-27 | 中国科学技术大学 | Self-adaptive class-level object attitude estimation method based on graph convolution double-flow shape prior |
CN117132650A (en) * | 2023-08-25 | 2023-11-28 | 中国科学技术大学 | Category-level 6D object pose estimation method based on point cloud image attention network |
WO2025035755A1 (en) * | 2023-08-16 | 2025-02-20 | 华为云计算技术有限公司 | Mesh model generation method and apparatus, and device |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5850469A (en) * | 1996-07-09 | 1998-12-15 | General Electric Company | Real time tracking of camera pose |
CN110119148A (en) * | 2019-05-14 | 2019-08-13 | 深圳大学 | A kind of six-degree-of-freedom posture estimation method, device and computer readable storage medium |
CN112767478A (en) * | 2021-01-08 | 2021-05-07 | 北京航空航天大学 | Appearance guidance-based six-degree-of-freedom pose estimation method |
CN113393503A (en) * | 2021-05-24 | 2021-09-14 | 湖南大学 | Classification-driven shape prior deformation category-level object 6D pose estimation method |
CN114299150A (en) * | 2021-12-31 | 2022-04-08 | 河北工业大学 | A deep 6D pose estimation network model and workpiece pose estimation method |
KR20220065234A (en) * | 2020-11-13 | 2022-05-20 | 주식회사 플라잎 | Apparatus and method for estimating of 6d pose |
KR20220088289A (en) * | 2020-12-18 | 2022-06-27 | 삼성전자주식회사 | Apparatus and method for estimating object pose |
CN114863573A (en) * | 2022-07-08 | 2022-08-05 | 东南大学 | A Category-Level 6D Pose Estimation Method Based on Monocular RGB-D Images |
US20220292698A1 (en) * | 2021-03-11 | 2022-09-15 | Fudan University | Network and System for Pose and Size Estimation |
CN115187748A (en) * | 2022-07-14 | 2022-10-14 | 湘潭大学 | A class-level object centroid and pose estimation based on point clouds |
US20220362945A1 (en) * | 2021-05-14 | 2022-11-17 | Industrial Technology Research Institute | Object pose estimation system, execution method thereof and graphic user interface |
-
2023
- 2023-02-02 CN CN202310052012.8A patent/CN116245940B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5850469A (en) * | 1996-07-09 | 1998-12-15 | General Electric Company | Real time tracking of camera pose |
CN110119148A (en) * | 2019-05-14 | 2019-08-13 | 深圳大学 | A kind of six-degree-of-freedom posture estimation method, device and computer readable storage medium |
KR20220065234A (en) * | 2020-11-13 | 2022-05-20 | 주식회사 플라잎 | Apparatus and method for estimating of 6d pose |
KR20220088289A (en) * | 2020-12-18 | 2022-06-27 | 삼성전자주식회사 | Apparatus and method for estimating object pose |
CN112767478A (en) * | 2021-01-08 | 2021-05-07 | 北京航空航天大学 | Appearance guidance-based six-degree-of-freedom pose estimation method |
US20220292698A1 (en) * | 2021-03-11 | 2022-09-15 | Fudan University | Network and System for Pose and Size Estimation |
US20220362945A1 (en) * | 2021-05-14 | 2022-11-17 | Industrial Technology Research Institute | Object pose estimation system, execution method thereof and graphic user interface |
CN113393503A (en) * | 2021-05-24 | 2021-09-14 | 湖南大学 | Classification-driven shape prior deformation category-level object 6D pose estimation method |
CN114299150A (en) * | 2021-12-31 | 2022-04-08 | 河北工业大学 | A deep 6D pose estimation network model and workpiece pose estimation method |
CN114863573A (en) * | 2022-07-08 | 2022-08-05 | 东南大学 | A Category-Level 6D Pose Estimation Method Based on Monocular RGB-D Images |
CN115187748A (en) * | 2022-07-14 | 2022-10-14 | 湘潭大学 | A class-level object centroid and pose estimation based on point clouds |
Non-Patent Citations (3)
Title |
---|
LU ZOU 等: "6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-Based Instance Representation Learning", 《IEEE》 * |
MENG TIAN 等: "Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation", 《ARXIV:2007.08454V1》 * |
桑晗博 等: "基于深度三维模型表征的类别级六维位姿估计", 《 中国传媒大学学报(自然科学版)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116958958A (en) * | 2023-07-31 | 2023-10-27 | 中国科学技术大学 | Self-adaptive class-level object attitude estimation method based on graph convolution double-flow shape prior |
WO2025035755A1 (en) * | 2023-08-16 | 2025-02-20 | 华为云计算技术有限公司 | Mesh model generation method and apparatus, and device |
CN117132650A (en) * | 2023-08-25 | 2023-11-28 | 中国科学技术大学 | Category-level 6D object pose estimation method based on point cloud image attention network |
Also Published As
Publication number | Publication date |
---|---|
CN116245940B (en) | 2024-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Atapour-Abarghouei et al. | Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer | |
CN111862201B (en) | A relative pose estimation method for spatial non-cooperative targets based on deep learning | |
CN116245940B (en) | Category-level six-degree-of-freedom object pose estimation method based on structural difference perception | |
Gou et al. | Cascade learning from adversarial synthetic images for accurate pupil detection | |
CN110751097B (en) | Semi-supervised three-dimensional point cloud gesture key point detection method | |
CN112001859A (en) | Method and system for repairing face image | |
CN117152330B (en) | Point cloud 3D model mapping method and device based on deep learning | |
CN113393503A (en) | Classification-driven shape prior deformation category-level object 6D pose estimation method | |
CN110895683A (en) | Kinect-based single-viewpoint gesture and posture recognition method | |
CN116772820A (en) | Local refinement mapping system and method based on SLAM and semantic segmentation | |
CN114638866A (en) | Point cloud registration method and system based on local feature learning | |
CN112801945A (en) | Depth Gaussian mixture model skull registration method based on dual attention mechanism feature extraction | |
CN114882524A (en) | Monocular three-dimensional gesture estimation method based on full convolution neural network | |
CN118261979A (en) | A category-level 6D pose estimation method based on geometric information enhancement | |
CN116152334A (en) | Image processing method and related equipment | |
JP7464512B2 (en) | 3D human posture estimation device, method and program | |
CN113763536A (en) | A 3D Reconstruction Method Based on RGB Image | |
CN118429421A (en) | Masked Point-transducer-based bimodal fusion 6D pose estimation method | |
Akizuki et al. | ASM-Net: Category-level Pose and Shape Estimation Using Parametric Deformation. | |
CN116468793A (en) | Image processing method, device, electronic equipment and storage medium | |
Li et al. | Sd-pose: Structural discrepancy aware category-level 6d object pose estimation | |
CN114723809A (en) | Method and device for estimating the pose of an object, and electronic device | |
Liu et al. | CMT-6D: a lightweight iterative 6DoF pose estimation network based on cross-modal Transformer | |
CN116067360B (en) | Robot map construction method based on double constraints, storage medium and equipment | |
CN117745948A (en) | Space target image three-dimensional reconstruction method based on improved TransMVSnet deep learning algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |