[go: up one dir, main page]

CN113160411A - Indoor three-dimensional reconstruction method based on RGB-D sensor - Google Patents

Indoor three-dimensional reconstruction method based on RGB-D sensor Download PDF

Info

Publication number
CN113160411A
CN113160411A CN202110441618.1A CN202110441618A CN113160411A CN 113160411 A CN113160411 A CN 113160411A CN 202110441618 A CN202110441618 A CN 202110441618A CN 113160411 A CN113160411 A CN 113160411A
Authority
CN
China
Prior art keywords
scene
rgb
model
cad
cad model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110441618.1A
Other languages
Chinese (zh)
Inventor
颜成钢
吕坤
朱尊杰
黄培武
徐枫
孙垚棋
张继勇
张勇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110441618.1A priority Critical patent/CN113160411A/en
Publication of CN113160411A publication Critical patent/CN113160411A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/10Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data

Landscapes

  • Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于RGB‑D传感器的室内三维重建方法,本发明方法通过将RGB信息引入物体识别分类,通过使用图注意模块,能够更好地避免噪声对原始扫描数据干扰导致物体识别结果并不理想的情况,并且,有别于以往的重建过程,应用CAD模型替换扫描场景中的物体来得到整洁、紧凑的表示,在最后重建完成的时候通过将CAD模型与场景中的物体进行关键点差异的比较,通过迭代优化减少对齐误差,使得CAD模型在大小及位姿上都与场景中的物体一致。本发明可以解决由于传感器噪声干扰以及传感器运动产生的模糊所带来的分类识别不准确,以及整体场景重建精度不够的问题,且由于引入CAD模型,可自由编辑,增加了场景的灵活性。

Figure 202110441618

The invention discloses an indoor three-dimensional reconstruction method based on RGB-D sensors. By introducing RGB information into object recognition and classification, the method of the invention can better avoid the interference of noise on original scanning data and cause object recognition results by using a graph attention module. This is not ideal, and, unlike previous reconstruction processes, the CAD model is used to replace objects in the scanned scene to get a clean, compact representation, by keying the CAD model with the objects in the scene when the final reconstruction is complete. The comparison of point differences, through iterative optimization, reduces the alignment error, so that the CAD model is consistent with the objects in the scene in size and pose. The invention can solve the problems of inaccurate classification and recognition caused by sensor noise interference and blur caused by sensor motion, and insufficient reconstruction accuracy of the overall scene, and can be freely edited due to the introduction of CAD models, which increases the flexibility of the scene.

Figure 202110441618

Description

Indoor three-dimensional reconstruction method based on RGB-D sensor
Technical Field
The invention belongs to the field of computer vision, and mainly relates to scene three-dimensional reconstruction, wherein a CAD (computer-aided design) model of a corresponding object in a scene is jointly retrieved by using geometric information, functional information and RGB (red, green and blue) information, and the final overall layout is optimized through repeated iteration to finish indoor three-dimensional reconstruction with higher precision.
Technical Field
Three-dimensional reconstruction refers to the establishment of a mathematical model suitable for computer representation and processing of a three-dimensional object, is the basis for processing, operating and analyzing the properties of the three-dimensional object in a computer environment, and is also a key technology for establishing virtual reality expressing an objective world in a computer.
In computer vision, three-dimensional reconstruction refers to a process of reconstructing three-dimensional information from single-view or multi-view images, and the three-dimensional reconstruction requires the use of empirical knowledge because the information of a single video is incomplete.
In recent years, the widespread use of consumer grade RGB-D sensors, such as Microsoft Kinect, Intel Real Sense, and Google Song, has made significant progress in RGB-D reconstruction. A very prominent area of research is based on volume fusion, where depth data is integrated into the volume signed distance function (TSDF). Many modern real-time reconstruction methods, such as KinectFusion, are based on this surface representation. In order to make the representation more memory efficient, octree or hash-based scene representations have been proposed. Another fusion method is based on points; the reconstruction quality is slightly lower, but the method has more flexibility in processing scene dynamics and can dynamically adapt to loop closure. Recent RGB-D reconstruction frameworks combine efficient scene representation with global pose estimation. Meanwhile, the latest research adopts a deep learning-based method for reconstruction, although the reconstruction quality is improved to a certain extent, due to the noise of a sensor for acquiring data and the blurring caused by the sensor in a motion state, the obtained scene three-dimensional scanning also has great noise and even often has incomplete conditions.
One solution to the above problem is to replace the incomplete scanned object with a CAD model, retrieve the CAD model of the scanned object from the model library, complete the replacement and complete the 9 degrees of freedom matching (i.e., size, position, orientation). CAD models have the characteristics of being complete, clean, and lightweight, and if all objects, including scenes, can be represented in this way, the above-mentioned problems of 3D scan noise or loss due to sensor noise or motion blur can be solved. And because the CAD model is introduced as the object representation in the scene, the editability of the CAD model also ensures that the engineering meaning of the whole scene is larger, and a more flexible representation can be realized, so that the scene reconstruction can be used for higher-level applications, such as AR/VR. However, to find and conform the CAD model to the input scan, the method includes several separate steps, a correspondence lookup, a correspondence match, and finally an optimization of the potential matching correspondence for each candidate CAD model. The steps are communicated from top to bottom, so once a little error occurs in a certain link, the final result may be greatly different from the expected result, even if the method of intense fire in recent years is applied to deep learning, the problem that the sensor obtains the original data inaccurately cannot be solved sufficiently, so that a relatively comprehensive method capable of generating feedback is urgently needed in the current school world to solve the problem, and once the problem is solved, the indoor three-dimensional reconstruction technology must be greatly improved.
An Attention module:
the point cloud with three-dimensional coordinates and optional features is input into a graphical attention convolution module. A k-nearest neighbor (KNN) map is computed from the spatial location of each point, generating a set of local neighbors whose features are connected to the global features computed by the global attention module. These connected features are input into the MLP layer, the output of which implements the element product together with the edge attention weight and the density attention weight obtained by the edge and density attention module. Through the MLP layer and the maximum pooling, the same characteristic diagram as the input data is finally obtained.
Full connection layer:
fully connected layers (FC) act as "classifiers" throughout the convolutional neural network. If we say that operations such as convolutional layers, pooling layers, and activation function layers map raw data to hidden layer feature space, the fully-connected layer serves to map the learned "distributed feature representation" to the sample label space. In practical use, the fully-connected layer may be implemented by a convolution operation: a fully-connected layer that is fully-connected to the previous layer may be converted to a convolution with a convolution kernel of 1x 1; while the fully-connected layer whose preceding layer is a convolutional layer can be converted to a global convolution with a convolution kernel of hxw, h and w being the height and width of the preceding layer convolution result, respectively. After the geometric information and the RGB information are respectively extracted, each voxel tends to be complete through the layer, so that the result trained by the neural network can accord with the expectation.
Loss reaction: in the training process of the neural network, a Loss function (Loss function) is used for evaluating whether the network is trained in place, the network aims to reduce the function value as far as possible, and in the process of repeated iteration, related parameters are adjusted by the network to finish training.
Signed Distance Field (SDF): signed, sign, Distance, point-to-point Distance, Field, zone, function to determine whether a point is in a zone.
Voxel volume: a voxel is an abbreviation of Volume element (Volume Pixel), and a Volume containing a voxel can be represented by Volume rendering or by extracting a polygonal isosurface of a given threshold contour. Equivalent to pixels in an RGB map, the prediction heat map method in a two-dimensional image can be transferred into a three-dimensional space by the same principle, and the matching of two three-dimensional objects is completed.
Levenberg-Marquardt (LM) algorithm: the LM algorithm is an iterative algorithm that can be used to solve the least squares problem, it can be seen as a combination of the steepest descent method and the Gauss-Newton method (by adjusting the damped μ switch). when the current solution is farther from the optimal solution, the algorithm is closer to the steepest descent method, slow but guaranteed to descend; when the current solution is close to the optimal solution, the algorithm is close to the Gauss-Newton method, and fast convergence is achieved.
Disclosure of Invention
When the sensor acquires scanning data, the sensor is often influenced by noise, blurring caused by the motion of the sensor and the like, so that the obtained 3D scanning of the scene has the phenomena of noise, loss and the like, and the object in the scene is difficult to classify and model. In the reconstruction process, the existing methods provide a solution to the above problem by replacing the objects in the scene scan with a complete, lightweight representation of the CAD model. However, the method only carries out retrieval matching aiming at geometric information, and does not apply RGB information, so the method focuses on researching how to apply the RGB information to the CAD model retrieval matching process, and after model matching, the method applies the idea of closed-loop control in the traditional control theory, feeds back by comparing with original scanning data, iterates repeatedly until the required precision is reached, optimizes the overall layout, and enables the reconstruction precision to reach a higher level.
The invention provides an indoor three-dimensional reconstruction method based on an RGB-D sensor, which can better avoid the situation that the object recognition result is not ideal due to the interference of noise on original scanning data by introducing RGB information into object recognition classification and using a figure attention module, is different from the prior reconstruction process, and reduces the error by comparing a key point difference with an initial scanning scene and carrying out iterative optimization when the final reconstruction is completed. The invention can solve the problems of inaccurate classification identification and insufficient reconstruction precision of the whole scene caused by noise interference of the sensor and fuzzy generated by the motion of the sensor with better effect, and increases the flexibility of the scene (because of introducing a CAD model, the scene can be freely edited).
An indoor three-dimensional reconstruction method based on an RGB-D sensor comprises the following steps:
step 1: acquiring indoor integral 3D scanning data by using an RGB-D sensor;
step 2: voxelizing scene 3D scanning data, real object models in a real object model library and CAD models in a ShapeNet data set;
and step 3: applying a graph attention mechanism to reduce the difficulty of identifying objects due to incomplete scanning;
and 4, step 4: combining the color information with the geometric information to identify a real object model corresponding to an object part in a scanned scene voxel block;
and 5: searching a CAD model which is closest to the corresponding real object model;
step 6: replacing all objects in the original scene 3D scanning data with corresponding CAD models, and performing attitude optimization after completing the replacement;
and 7: and performing joint optimization on the functional space and the geometric space of the overall layout to optimize the overall layout.
The specific method of the step 2 is as follows:
the method comprises the steps of representing scene 3D scanning data by voxels, obtaining scanning scene voxel blocks, coding the scanning scene voxel blocks into a designed distance field (SDF), and carrying out voxelization by means of information combining RGB and a depth map, namely, the voxels not only retain geometric information but also retain RGB information, and coding the voxels into true object models in a true object model library crawled on the network and CAD models in a ShapeNet data set. The item types in the real object model library correspond to ShapeNet data sets.
The specific method in step 3 is as follows:
the method comprises the steps that the parts, without defects, of scanned objects in scene 3D scanning data occupy large weights in recognition and classification through a graph attention machine mechanism, the weights of the parts without defects are correspondingly reduced, the relations between input and output characteristics of all nodes are represented through a weight matrix, the matrix is obtained through training, the objects are split into components, the objects without defects are recognized according to prior knowledge obtained from other parts, the parts without defects are more sensitive, and negative effects caused by the fact that the objects are incomplete are compensated through combination of color information;
the specific method of the step 4 is as follows:
matching the cut object part of the voxelized scanning scene with the voxelized real object model through 3DCNN, using cross entropy as loss function, judging the matching probability of the voxels of the object part in the whole scene and the voxels of the real object model in a mode of outputting a heat map, bringing color information (RGB information) into the probability of judging whether the voxels are matched, and comparing the RGB values of input data and model data; the geometry information is processed in parallel with the RGB information, and finally both types of information are combined at each point through a full link layer. Finally, a real object model corresponding to the object part of the scanned scene voxel block is obtained; the probability of the heat map output is the probability that each point corresponds to a true object model voxel after the object model is pixelized, ranging from 0 to 1.
The specific method of the step 5 is as follows:
and (4) searching the CAD model closest to the corresponding real object model obtained in the step (4), calculating the L2 distance by coding the CAD model into characteristic vectors respectively, selecting the minimum pair for searching, wherein the process only needs to use geometric information, and then matching the cut-out voxelized scanning field scene body part with the obtained corresponding CAD model through 3DCNN to obtain a corresponding heat map.
The specific method of step 6 is as follows:
and (3) registering the positions of the original objects in the CAD model obtained in the step (5) and the scene 3D scanning data, wherein a coordinate system is required to be transformed, the positions of the original objects in the scene 3D scanning data are converted into the coordinate system of the CAD model from a world coordinate system (namely the coordinate system of the scanning scene), and lie algebra a is applied for representation. Besides, the relation of the dimension s between the CAD model and the object in the scene 3D scanning data is represented by 3-dimensional vectors (Sx, Sy, Sz), namely, the dimension deviation in each direction is obtained, the dimension s and the lie algebra a are combined and optimized, a 4x4 conversion matrix is used for representation, wherein the conversion matrix comprises the lie algebra and a three-dimensional vector for representing the dimension, an energy function minimization problem is constructed on the basis, namely, how to determine the rotation, the translation and the dimension of the CAD model to ensure that the CAD model is closer to the position of the object in the original 3D scanning scene, a Levenberg-Marrdquat (LM) algorithm is applied to solve the problem, and the lie and the 3-dimensional vectors (Sx, Sy, Sz) for representing the dimension are continuously iterated to obtain a solution for minimizing the energy function. Finally, a form that all object parts in the original scene 3D scanning data are replaced by corresponding CAD model representations is obtained, namely the replaced 3D scene representation;
the specific method of step 7 is as follows:
and (3) comparing the 3D scene representation after the replacement with the original scanning, taking the difference of Euclidean distances of key points as an error, if the error value is greater than or equal to a set threshold value, the matching fails, and if the error value is smaller than the set threshold value, repeatedly iterating the scene representation obtained for the first time by the method in the step (6) until the value of the error function is smaller than the set minimum value, so that the optimization of the pose of the CAD model after the replacement is completed.
The invention has the following beneficial effects:
innovation points 1: the RGB information, namely the color feature retrieval is innovatively integrated into the object CAD model retrieval process. The RGB data is generally more representative than depth or geometric information, and compared with the traditional method of simply applying the geometric information, the matching effect can be obviously improved, namely, an object model is matched through the combination of the RGB and the geometric information, and then the CAD model is matched through the object model.
Innovation points 2: after the whole process is finished, comparing with the original scanning, setting an error function (the deviation degree of the region after the attention module is applied to the object part) to observe the reconstruction precision, and repeating iteration until reaching the standard if not reaching the standard.
Innovation points 3: the invention introduces a graph attention mechanism, and can finish the identification and classification work of the object without reading the scanning data obtained by scanning the whole object by training the node weight. The sensor is lost due to noise and the like, and the probability of retrieving a complete model can be improved by increasing the weight of the part which is not lost, which is equivalent to completing 'completion'.
Drawings
FIG. 1 is an overall flow chart of the method of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1, an indoor three-dimensional reconstruction method based on an RGB-D sensor includes the following steps:
step 1: acquiring indoor integral 3D scanning data by using an RGB-D sensor; it is noted that mobile devices of larger amplitude are avoided as much as possible.
Step 2: the method comprises the steps of representing scene 3D scanning data by voxels, obtaining scanning scene voxel blocks, coding the scanning scene voxel blocks into a designed distance field (SDF), and carrying out voxelization by means of information combining RGB and a depth map, namely, not only retaining geometric information but also retaining RGB information in the voxels, and similarly coding the voxels for real object models in a real object model library crawled on the network and CAD models in a ShapeNet data set, so that the method can be analogized to classical two-dimensional image processing, and can be used for matching each point of a three-dimensional object. The item types in the real object model library correspond to ShapeNet data sets.
And step 3: the method comprises the steps of enabling the parts, without defects, of scanned objects in scene 3D scanning data to have large weights in recognition and classification through a graph attention machine mechanism, enabling the weights of the parts without defects to be correspondingly reduced, representing the relation between input and output characteristics of all nodes through a weight matrix, obtaining the matrix through training (training through a sample of the scanning objects without defects, increasing punishment on error judgment conditions, respectively extracting characteristics (by using part net) from the parts, such as four legs and a plane, of each object, combining the parts into the object through a full connecting layer, enabling the object, such as an object formed by the three legs and the plane through the full connecting layer, to be accepted as a stool within a certain error, wherein the step is mainly used for extracting semantic information of the object and improving the efficiency of next step of retrieving a real object model, because semantic information of the object is provided, when a real object model matching the model is searched, the model does not need to be searched in the model base all the time, but the model set of the corresponding type is traversed, so that the network can consider that, for example, four legs exist, but no plane exists, and the model set has higher probability of being a stool), the object is split into components, for example, a chair is a backrest, a seat and four legs, and then, if only three legs exist and the back is incomplete, the chair can be identified according to prior knowledge obtained from other parts, and the characteristics of the part without the incomplete part are more sensitive, so that the negative effect brought by the incomplete object is compensated by combining color information; for example, one third of a table is omitted, and only the remaining two thirds of the table are needed to find out the corresponding object in the real object model library.
And 4, step 4: cutting out an object part from a voxelized scanning scene (by selecting a point at the center of an object in the scanning scene during training and then cutting out the object in a region of 64x64x64 around the point) and matching the voxelized real object model through a 3DCNN (which can be analogized to object recognition in a 2-dimensional image), using cross entropy as a loss function, and outputting a heat map, wherein the probability output by the heat map is the probability of each point corresponding to the voxel of the real object model after the object model is voxelized, the range is 0 to 1, incorporating color information (namely RGB information) into the probability of judging whether the points are matched, comparing the RGB values of the input data and the model data (for example, even if two tables are similar in geometry, the different tables are not completely the same), so applying a color feature helps to identify the object more accurately, exactly which kind of table). The geometry information is processed in parallel with the RGB information, and finally both types of information are combined at each point through a full link layer. Finally, a real object model corresponding to the object part of the scanned scene voxel block is obtained;
and 5: and (4) searching the CAD model closest to the corresponding real object model obtained in the step (4), calculating the L2 distance by coding the CAD model into characteristic vectors respectively, selecting the minimum pair for searching, wherein the process only needs to use geometric information, and then matching the cut-out voxelized scanning field scene body part with the obtained corresponding CAD model through 3DCNN to obtain a corresponding heat map. The significance of this is that the accuracy of directly matching the CAD model is higher than that of matching the real object model first and then indirectly matching the CAD model, and the next pose optimization process needs to find the relation between the voxel blocks after the object is voxelized and the CAD model is voxelized in the scanning scene, i.e. the heat map, and optimize the corresponding positions of the voxel blocks to achieve the purpose of adjusting the pose and the size of the whole CAD model.
Step 6: since the CAD model is not necessarily in the model library, to register the position of the corresponding object CAD model obtained in step 5 with the position of the original object in the scene 3D scan data, a coordinate system transformation is required, and the position of the original object in the scene 3D scan data is transformed from the world coordinate system (coordinate system of the scanned scene) to the coordinate system of the CAD model, similarly to the transformation of the camera coordinate system and the world coordinates (the obtained initial data set has the corresponding pos label, refer to the obtained initial data set, and refer to the position label
Figure BDA0003035273470000111
R is a rotation matrix and t is a position translation transformation), the lie algebra a representation is applied. Denote a point in the scanned object as (p)j,Hj) In the form of (1), where pj is a point of an object in the scanned scene (a voxel block, a voxel block in three-dimensional space corresponds to a point of a two-dimensional image), Hj is a probability (also 0 to 1), that is, whether the voxel block of the object in the scanned scene is a voxel block on the CAD model (because we input a region cut out from the scanned scene with the object as the center and including a voxel block of a part of the scanned scene)), and besides, the object in the CAD model and the 3D scanned data of the scene has a relation of a size s, and the object in the 3D scanned data of the scene is expressed by a 3-dimensional vector (Sx, Sy, Sz), that is, a size deviation in each direction, and the size s and the lie algebra a are optimized by combining, and a conversion matrix of 4x4 is used
Figure BDA0003035273470000121
Including lie algebra and a three-dimensional vector representing dimensions, mi represents the ith CAD model, to perform the representation
Figure BDA0003035273470000122
Is mapped to convert six-dimensional lie algebra (three-dimensional rotation, three-dimensional translation) and three-dimensional size vectors into a 4x4 matrix. Applications cvox=Twordvox·Tmi(a,s)·pj(wherein T iswordvoxIs a transformation of the world coordinate system into the coordinate system of the CAD model) from voxel points pj in the scan scene to point coordinates after the voxels of the CAD model, on the basis of which an energy function minimization problem is to be constructed,
Figure BDA0003035273470000123
i.e. how to determine the rotation, translation, and size of the CAD model to ensure that it is closer to the position of the object in the original 3D scanned scene (H is required)j(cvox) To make f as small as possible, i.e., to make the voxel coordinates on the CAD model voxel coordinate system as close as possible to the points on the CAD model that we previously applied the heat map to determine, when pose matching, corresponding to the objects in the scanned scene, by transforming a and s, since we determined whether pose transformation was completed by using the closeness of the corresponding points, apply Levenberg-marquardt (lm) algorithm to solve this problem, continuously iterate lie algebra and 3-dimensional vectors (Sx, Sy, Sz) representing the dimensions,
Figure BDA0003035273470000131
to obtain a solution that minimizes the energy function (these two points are the key to pose optimization). Finally, a form that all object parts in the original scene 3D scanning data are replaced by corresponding CAD model representations, namely the replaced 3D scene representation is obtained, because the objects in the scene have the condition of incomplete scanning due to sensors, the objects are replaced by the CAD model, and the purposes of completeness, cleanness, light weight and light weight are achievedThe level standard, and because the CAD model has the characteristic of free editing, the scene can be more flexibly represented;
and 7: comparing the 3D scene representation after the replacement with the original scanning, taking the difference of Euclidean distances of key points (such as objects or edge corner points) as an error (basically equal to the previous step, namely judging that the result from the visual angle after the algorithm convergence of the previous step cannot be accepted), if the error value is larger than or equal to a set threshold value, namely the matching fails, and if the error value is smaller than the set threshold value, repeatedly iterating the scene representation obtained for the first time by the method of the step 6 until the value of the error function is smaller than the set minimum value, namely the optimization of the pose of the CAD model after the replacement is considered to be completed.

Claims (7)

1.一种基于RGB-D传感器的室内三维重建方法,其特征在于,包括以下步骤:1. an indoor three-dimensional reconstruction method based on RGB-D sensor, is characterized in that, comprises the following steps: 步骤1:利用RGB-D传感器获得室内整体的3D扫描数据;Step 1: Use the RGB-D sensor to obtain the overall 3D scan data of the room; 步骤2:将场景3D扫描数据、真实物体模型库中的真实物体模型以及ShapeNet数据集中的CAD模型体素化;Step 2: Voxelize the scene 3D scan data, the real object model in the real object model library, and the CAD model in the ShapeNet dataset; 步骤3:应用图注意力机制来减小由于扫描不完整带来识别物体的难度;Step 3: Apply the graph attention mechanism to reduce the difficulty of identifying objects due to incomplete scanning; 步骤4:将颜色信息与几何信息相结合,识别出扫描场景体素块中物体部分对应的真实物体模型;Step 4: Combine the color information with the geometric information to identify the real object model corresponding to the object part in the voxel block of the scanned scene; 步骤5:寻找与对应的真实物体模型最为贴近的CAD模型;Step 5: Find the CAD model closest to the corresponding real object model; 步骤6:将原场景3D扫描数据中所有物体替换为对应的CAD模型,完成替换后进行姿态优化;Step 6: Replace all objects in the 3D scan data of the original scene with the corresponding CAD models, and perform posture optimization after the replacement is completed; 步骤7:将整体布局的功能空间和几何空间进行联合优化,优化整体布局。Step 7: Jointly optimize the functional space and geometric space of the overall layout to optimize the overall layout. 2.根据权利要求1所述的一种基于RGB-D传感器的室内三维重建方法,其特征在于,步骤2具体方法如下:2. a kind of indoor three-dimensional reconstruction method based on RGB-D sensor according to claim 1, is characterized in that, the concrete method of step 2 is as follows: 将场景3D扫描数据用体素表示,获得扫描场景体素块,并编码为signed distancefield(SDF),依靠RGB和深度图结合的信息来将其体素化,即体素中不仅保留几何信息也保留RGB信息,对于网上爬取的真实物体模型库中的真实物体模型以及ShapeNet数据集中的CAD模型,也同样编码为体素;所述的真实物体模型库中的物品种类与ShapeNet数据集相对应。The 3D scan data of the scene is represented by voxels, and the voxel block of the scanned scene is obtained and encoded as signed distancefield (SDF). RGB information is retained, and the real object model in the real object model library crawled on the Internet and the CAD model in the ShapeNet data set are also encoded as voxels; the types of items in the real object model library correspond to the ShapeNet data set . 3.根据权利要求2所述的一种基于RGB-D传感器的室内三维重建方法,其特征在于,步骤3具体方法如下:3. a kind of indoor three-dimensional reconstruction method based on RGB-D sensor according to claim 2 is characterized in that, the concrete method of step 3 is as follows: 通过图注意力机制使得场景3D扫描数据中的扫描物体没有残缺的部分在识别分类中占有大的权重,残缺的部分则权重相应减小,通过一个权值矩阵来代表所有节点输入与输出特征的关系,训练得出这个矩阵,将物体拆分成组成部分,依据其他部位得到的先验知识识别残缺物体,且会对没有残缺的部分更加敏感,以此结合颜色信息来补偿物体不完整带来的负面效果。Through the graph attention mechanism, the parts of the scanned objects in the scene 3D scan data that are not incomplete have a large weight in the recognition and classification, and the incomplete parts have a corresponding reduction in weight. A weight matrix is used to represent the input and output features of all nodes. This matrix is obtained by training, the object is divided into components, and the incomplete objects are identified according to the prior knowledge obtained from other parts, and the parts that are not incomplete will be more sensitive, so as to combine the color information to compensate for the incompleteness of the object. negative effects. 4.根据权利要求3所述的一种基于RGB-D传感器的室内三维重建方法,其特征在于,步骤4具体方法如下:4. a kind of indoor three-dimensional reconstruction method based on RGB-D sensor according to claim 3, is characterized in that, the concrete method of step 4 is as follows: 将体素化后的扫描场景裁剪出物体部分与体素化后的真实物体模型通过一个3DCNN进行匹配,用交叉熵作为loss function,通过输出热图的方式,来判断整个场景中物体部分的体素与真实物体模型的体素的匹配概率,将颜色信息(RGB信息)也纳入判断是否匹配的概率中,对比输入数据和模型数据的RGB值;几何信息与RGB信息并行处理,最后通过一个全连接层来使每一个点的两种信息都结合起来;最终获得与扫描场景体素块物体部分相对应的真实物体模型;热图输出的概率是物体模型体素化后,每一个点对应于真实物体模型体素的概率,范围为0到1。Cut out the object part from the voxelized scanning scene and match the voxelized real object model through a 3DCNN, use cross entropy as the loss function, and output the heat map to determine the volume of the object part in the entire scene. The matching probability between the voxel and the voxel of the real object model, the color information (RGB information) is also included in the probability of matching, and the RGB values of the input data and the model data are compared; the geometric information and the RGB information are processed in parallel, and finally through a full Connect the layers to combine the two kinds of information of each point; finally obtain the real object model corresponding to the object part of the voxel block of the scanned scene; the probability of the heat map output is that after the object model is voxelized, each point corresponds to The probability of the voxel of the real object model, in the range 0 to 1. 5.根据权利要求4所述的一种基于RGB-D传感器的室内三维重建方法,其特征在于,步骤5具体方法如下:5. a kind of indoor three-dimensional reconstruction method based on RGB-D sensor according to claim 4, is characterized in that, the concrete method of step 5 is as follows: 寻找与步骤4获得的对应的真实物体模型最为贴近的CAD模型,通过分别编码为特征向量计算L2距离选取最小的配对进行寻找,这个过程仅应用几何信息即可,然后将裁剪出的体素化后的扫描场景物体部分与获得的对应的CAD模型通过一个3DCNN进行匹配,获得相应的热图。Find the CAD model that is closest to the corresponding real object model obtained in step 4, and select the smallest pair by encoding it as a feature vector to calculate the L2 distance. The object parts of the scanned scene are then matched with the corresponding CAD models obtained through a 3DCNN to obtain the corresponding heatmaps. 6.根据权利要求5所述的一种基于RGB-D传感器的室内三维重建方法,其特征在于,步骤6具体方法如下:6. a kind of indoor three-dimensional reconstruction method based on RGB-D sensor according to claim 5 is characterized in that, the concrete method of step 6 is as follows: 将骤5获得的CAD模型与场景3D扫描数据中原物体的位置配准需要进行一个坐标系的变换,将场景3D扫描数据中原物体的位置从世界坐标系即(扫描场景的坐标系)转到CAD模型的坐标系,应用李代数a表示;除此之外,CAD模型和场景3D扫描数据中的物体还存在尺寸s的关系,采用3维向量(Sx,Sy,Sz)表示,即为各个方向的尺寸偏移,将尺寸s和李代数a结合进行优化,采用一个4x4的转换矩阵进行表示,其中包括李代数以及一个表示尺寸的三维向量,在此基础上构建一个能量函数最小化问题,即如何确定CAD模型的旋转,平移,以及尺寸大小来保证其更贴近于原3D扫描场景中的物体位置,应用Levenberg-Marquardt(LM)算法来解决这个问题,不断地迭代李代数和表示尺寸的3维向量(Sx,Sy,Sz),来获取使得能量函数最小的解;最终得到将原场景3D扫描数据中所有物体部分被替换为对应的CAD模型表示的形式,即替换后的3D场景表示。To register the CAD model obtained in step 5 with the position of the original object in the 3D scan data of the scene, a coordinate system transformation needs to be performed, and the position of the original object in the scene 3D scan data is transferred from the world coordinate system (the coordinate system of the scanned scene) to CAD. The coordinate system of the model is represented by the Lie algebra a; in addition, the CAD model and the objects in the 3D scan data of the scene also have a relationship of size s, which is represented by a 3-dimensional vector (Sx, Sy, Sz), that is, each direction The size offset of , the size s and the Lie algebra a are combined for optimization, and represented by a 4x4 transformation matrix, which includes the Lie algebra and a three-dimensional vector representing the size. On this basis, an energy function minimization problem is constructed, that is How to determine the rotation, translation, and size of the CAD model to ensure that it is closer to the position of the object in the original 3D scan scene, apply the Levenberg-Marquardt (LM) algorithm to solve this problem, and continuously iterate the Lie algebra and the 3 representing the size The dimensional vector (Sx, Sy, Sz) is used to obtain the solution that minimizes the energy function; finally, all objects in the original scene 3D scan data are replaced by the corresponding CAD model representation, that is, the replaced 3D scene representation. 7.根据权利要求6所述的一种基于RGB-D传感器的室内三维重建方法,其特征在于,步骤7:将完成替换后的3D场景表示与原扫描相比较,将关键点的欧式距离的差异作为误差,若误差值大于等于设定阈值,即匹配失败,若误差值小于设定阈值,则将第一次得到的场景表示通过步骤6的方法进行反复迭代,直到误差函数的值小于设定的最小值,即完成对替换后的CAD模型位姿的优化。7. a kind of indoor three-dimensional reconstruction method based on RGB-D sensor according to claim 6, is characterized in that, step 7: compare the 3D scene representation after completing the replacement with the original scan, compare the Euclidean distance of the key point. The difference is used as the error. If the error value is greater than or equal to the set threshold, the matching fails. If the error value is less than the set threshold, the scene representation obtained for the first time is repeated by the method of step 6 until the value of the error function is less than the set threshold. The fixed minimum value, that is, the optimization of the pose of the replaced CAD model is completed.
CN202110441618.1A 2021-04-23 2021-04-23 Indoor three-dimensional reconstruction method based on RGB-D sensor Withdrawn CN113160411A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110441618.1A CN113160411A (en) 2021-04-23 2021-04-23 Indoor three-dimensional reconstruction method based on RGB-D sensor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110441618.1A CN113160411A (en) 2021-04-23 2021-04-23 Indoor three-dimensional reconstruction method based on RGB-D sensor

Publications (1)

Publication Number Publication Date
CN113160411A true CN113160411A (en) 2021-07-23

Family

ID=76869824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110441618.1A Withdrawn CN113160411A (en) 2021-04-23 2021-04-23 Indoor three-dimensional reconstruction method based on RGB-D sensor

Country Status (1)

Country Link
CN (1) CN113160411A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119930A (en) * 2022-01-27 2022-03-01 广州中望龙腾软件股份有限公司 Three-dimensional model correction method and device based on deep learning and storage medium
WO2023035548A1 (en) * 2021-09-09 2023-03-16 上海商汤智能科技有限公司 Information management method for target environment and related augmented reality display method, electronic device, storage medium, computer program, and computer program product
CN117710469A (en) * 2024-02-06 2024-03-15 四川大学 An online dense reconstruction method and system based on RGB-D sensors

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023035548A1 (en) * 2021-09-09 2023-03-16 上海商汤智能科技有限公司 Information management method for target environment and related augmented reality display method, electronic device, storage medium, computer program, and computer program product
CN114119930A (en) * 2022-01-27 2022-03-01 广州中望龙腾软件股份有限公司 Three-dimensional model correction method and device based on deep learning and storage medium
CN117710469A (en) * 2024-02-06 2024-03-15 四川大学 An online dense reconstruction method and system based on RGB-D sensors
CN117710469B (en) * 2024-02-06 2024-04-12 四川大学 An online dense reconstruction method and system based on RGB-D sensor

Similar Documents

Publication Publication Date Title
Labbé et al. Megapose: 6d pose estimation of novel objects via render & compare
CN113524194B (en) Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
Samavati et al. Deep learning-based 3D reconstruction: a survey
Lei et al. Cadex: Learning canonical deformation coordinate space for dynamic surface representation via neural homeomorphism
JP6878011B2 (en) Rebuilding 3D modeled objects
CN113160411A (en) Indoor three-dimensional reconstruction method based on RGB-D sensor
CN111161364A (en) Real-time shape completion and attitude estimation method for single-view depth map
CN111898172A (en) Experiential Learning in the Virtual World
CN115861999B (en) A robot grasping detection method based on multimodal visual information fusion
CN111898173A (en) Experiential Learning in the Virtual World
CN112784736A (en) Multi-mode feature fusion character interaction behavior recognition method
CN116843753B (en) Robust 6D pose estimation method based on bidirectional matching and global attention network
CN114219920B (en) Method and device for constructing three-dimensional face model, storage medium and terminal
Maheshwari et al. Mugl: Large scale multi person conditional action generation with locomotion
CN112862736B (en) Real-time three-dimensional reconstruction and optimization method based on points
Stoiber et al. Fusing visual appearance and geometry for multi-modality 6dof object tracking
CN117437366A (en) A method for constructing multi-modal large-scale scene data sets
CN119540494A (en) A single-view 3D reconstruction method for pigs based on deep learning
Cai et al. Dynasurfgs: Dynamic surface reconstruction with planar-based gaussian splatting
Wang et al. Diffusion models in 3d vision: A survey
CN112365456B (en) Transformer substation equipment classification method based on three-dimensional point cloud data
Wu et al. Object pose estimation with point cloud data for robot grasping
CN118386250A (en) Method for deciding grasping gesture of robot arm by using large language model
Liebelt et al. Robust aam fitting by fusion of images and disparity data
CN112906432A (en) Error detection and correction method applied to human face key point positioning task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210723

WW01 Invention patent application withdrawn after publication