Point cloud part level segmentation method based on PointNet graph convolution and KNN search
Technical Field
The invention belongs to the technical field of point cloud object segmentation, and particularly relates to a point cloud part level segmentation method based on PointNet graph convolution and KNN search.
Background
Analysis and understanding of three-dimensional shapes has been an important research topic in computer graphics. In recent years, the popularization of depth sensors and three-dimensional laser scanners has promoted the rapid development of three-dimensional point cloud processing methods. The part-level segmentation of the three-dimensional point cloud object is used as the basis of 3D scene understanding and analysis, becomes the research focus in the fields of navigation positioning, medical image analysis, mode recognition and the like, and has important research value and wide application prospect.
The point cloud part level segmentation is to divide the point cloud into a plurality of regions with detailed semantic categories. The basic point cloud contains 6-dimensional features, XYZ coordinates and RGB color information, respectively. With the appearance of a large-scale data set, the reduction of computer hardware cost and the improvement of GPU parallel computing capability, deep learning gradually occupies an absolute dominant position in the point cloud segmentation field, deep learning taking a PointNet network as a pioneer is researched and used by more scholars, the network model is very simple, the original point cloud data can be directly operated according to the characteristics of the point cloud data, meanwhile, MaxPoint is used as a symmetric function to process the orderless of the point cloud, and two T-net networks are adopted to carry out rotation invariance processing on the model.
The PointNet network mainly processes points independently on a local scale, arrangement invariance is kept, but interaction relation of adjacent points is not considered, and therefore local characteristics of geometric relation among nodes are lost. Based on the method, the PointNet network is optimized, and the neighborhood characteristics of the point cloud are more efficiently aggregated by the part level segmentation method based on the graph convolution and KNN search of PointNet.
Disclosure of Invention
The invention aims to provide a point cloud part level segmentation method based on graph convolution and KNN search of PointNet, which solves the problem of low semantic segmentation precision caused by extracting the characteristics of single points only by a PointNet network.
The technical scheme adopted by the invention is that the point cloud part level segmentation method based on PointNet graph convolution and KNN search is implemented according to the following steps:
step 1, point cloud space alignment: inputting point cloud data, predicting an affine transformation matrix by using a T-Net micro network, and performing coordinate alignment on the input point cloud data by using the matrix;
step 2, local feature extraction: in order to capture local features, each point in the step 1 is taken as a vertex, k points with the nearest distance are selected to construct a k nearest neighborhood graph, edge convolution is applied to edges connecting adjacent point pairs to obtain edge feature information between every two point pairs, and then local feature matrix information is extracted through MLP operation of a multilayer perceptron network;
step 3, dynamically updating the local neighborhood map: calculating and updating a k neighbor graph of each layer of the MLP according to the embedding sequence, and extracting updated local feature matrix information;
step 4, point cloud feature transformation: designing a new T-Net mini network, inputting the updated local feature matrix information into the new T-Net network for coordinate alignment, and ensuring the rotation invariance of feature points;
and 5, fusing local features and global features: and carrying out MaxPolling maximum pooling operation on the aligned and updated local feature matrix information, and processing the global feature information obtained by the operation through a plurality of MLPs to obtain the category fraction of each object, thereby realizing the part-level segmentation of the three-dimensional object.
The present invention is also characterized in that,
the step 1 specifically comprises the following steps:
step 1.1, inputting point cloud data P ═ { P) of a three-dimensional object1,p2,…,pi,…,pNIn which p isiIndicating the position information of the ith point, where N is point _ num, which is the number of points of the object, N is 2048, the batch size is 4, and the one-time training period epoch is 200;
step 1.2, in order to make the model have arrangement invariance to the input, the information of each point is aggregated by using a symmetric function to obtain an affine matrix A:
f({p1,…,pN})≈g(h(p1),…,h(pN)), (1)
in the formula (1), h represents a multilayer perceptron network, g represents a maximum pooling function, and f represents the characteristic information of a captured point cloud set;
and step 1.3, after the input point cloud data is multiplied by the affine matrix A, the input point cloud coordinate alignment is realized through a plurality of MLP processing.
The step 2 specifically comprises the following steps:
step 2.1, selecting k nearest neighbor points k as 20 by using Euclidean distance as a measurement standard;
step 2.2, each point p of the point cloud data obtained after the alignment in the step 1 is processed
iAnd its k nearest neighbors q
jJ-1, 2, …, k constructs a k-neighborhood domain map, where
R
FRepresenting an F-dimensional input point cloud matrix.
Step 2.3, the k-neighborhood map structure is G ═ (V, E), where V ═ { p ═ piI | ═ 1, …, N } represents a set of vertices, E ═ Ei=(ei1,ei2,…,eij,…eik) I | ═ 1,2, …, N } represents the set of edges between vertices, eijRepresents a point piDirected edges with k neighbor points;
and 2.4, performing MLP convolution processing on the k near neighborhood map obtained after updating in the step 2.3 to obtain local characteristic matrix information of the three-dimensional point cloud object.
The step 3 is as follows:
step 3.1, the MLP comprises two convolution layers, a batch processing normalization layer and an activation layer, wherein the sizes of convolution kernels of the two convolution layers are 64 and 64 from left to right, and output point cloud data of each layer of the MLP network is
l represents the l-th layer of the MLP network;
step 3.2, obtaining a different output k near-neighborhood graph G for each MLP layer according to the output point cloud data of each MLP network layerl=(Vl,El),GlK-neighborhood graph, V, representing the output of the l-th layerlRepresents GlCorresponding set of vertices, ElRepresents GlA set of edges between corresponding vertices;
and 3.3, acquiring local feature matrix information of each layer according to the k near neighborhood graph output by each layer.
The new designed T-Net network is characterized in that a regularization item shown in a formula (3) is added in the softmax training loss of a conventional T-Net network framework, and a characteristic transformation matrix is limited to be close to an orthogonal matrix;
in the formula (3), A is an affine matrix predicted by the T-Net network, so that input information cannot be lost by orthogonal transformation, I is an identity matrix, and F represents a point cloud dimension.
The step 5 is as follows:
step 5.1, using the aligned and updated local feature matrix information as input, and performing convolution operation processing through 3 spatial _ transform modules to respectively obtain three local feature information nets1,net2,net3;
Step 5.2, respectively converting three local features corresponding to the 3 spatial _ transform modules: net1,net2,net3Splicing, and outputting out;
step 5.3, carrying out MaxPolling maximum pooling operation on the output out data to obtain the global characteristics of the current layer;
step 5.4, splicing the global features of the current layer with the tag information of the input point cloud in the data set to obtain global feature global _ feature of the whole network;
step 5.5, global _ feature and net of global feature1,net2,net3Splicing is carried out;
and 5.6, carrying out MLP multi-layer perceptron processing on the data spliced in the step 5.5 to obtain the category score conditions of all the objects, and realizing the part-level segmentation of the three-dimensional objects.
The invention has the beneficial effects that:
the invention relates to a point cloud part level segmentation method based on PointNet graph convolution and KNN search.A point cloud rotation invariance is ensured through a micro network T-Net (input _ transform); then, calculating Euclidean distance between every two point pairs, selecting k points with the nearest distance, and taking point piConstructing a k near neighborhood graph for the central point to extract local features, and dynamically updating the k neighborhood graph for each layer in subsequent model training to obtain local feature information; next, aligning the local feature data of different point clouds through a T-Net (feature _ transform) micro network; and finally, splicing the output multilayer local features and the global features, and then obtaining a part grade segmentation result of the three-dimensional point cloud object through a plurality of MLP (multilayer perceptron) operations.
Drawings
FIG. 1 is a network architecture diagram of a point cloud part level segmentation method based on PointNet's graph convolution and KNN search;
FIG. 2 is a graph convolution based on PointNet and KNN search point cloud part level segmentation method improved graph convolution and KNN search edge feature (edge _ feature) extraction network graph;
FIG. 3 is a result of part-level segmentation of a three-dimensional object in a ShapeNetCore dataset;
FIG. 4(a) is a graph of loss during ShapeNetCore dataset training
Figure 4(b) is a graph of accuracy during the training of the ShapeNetCore dataset.
Detailed Description
The present invention will be described in detail with reference to the following embodiments.
Examples
The embodiment provides a point cloud part level segmentation method based on PointNet graph convolution and KNN search, which is specifically implemented according to the following steps as shown in FIG. 1:
step 1, point cloud space alignment: inputting point cloud data, predicting an affine transformation matrix by using a T-Net micro network, and performing coordinate alignment on the input point cloud data by using the matrix;
step 1.1, inputting point cloud data P ═ { P) of a three-dimensional object1,p2,…,pi,…,pNIn which p isiIndicating the position information of the ith point, where N is point _ num, which is the number of points of the object, N is 2048, the batch size is 4, and the one-time training period epoch is 200;
step 1.2, in order to make the model have arrangement invariance to the input, the information of each point is aggregated by using a symmetric function to obtain an affine matrix A:
f({p1,…,pN})≈g(h(p1),…,h(pN)), (1)
in the formula (1), h represents a multilayer perceptron network, g represents a maximum pooling function, and f represents the characteristic information of a captured point cloud set;
and step 1.3, after the input point cloud data is multiplied by the affine matrix A, the input point cloud coordinate alignment is realized through a plurality of MLP processing.
Step 2, local feature extraction: in order to capture local features, each point in the step 1 is taken as a vertex, k points with the nearest distance are selected to construct a k nearest neighborhood graph, edge convolution is applied to edges connecting adjacent point pairs to obtain edge feature information between every two point pairs, and then local feature matrix information is extracted through MLP operation of a multilayer perceptron network;
step 2.1, selecting k nearest neighbor points k as 20 by using Euclidean distance as a measurement standard;
step 2.2, each point p of the point cloud data obtained after the alignment in the step 1 is processed
iAnd its k nearest neighbors q
jJ-1, 2, …, k constructs a k-neighborhood domain map, where
R
FRepresenting an F-dimensional input point cloud matrix.
Step 2.3, the k-neighborhood map structure is G ═ (V, E), where V ═ { p ═ piI | ═ 1, …, N } represents a set of vertices, E ═ Ei=(ei1,ei2,…,eij,…eik) I | ═ 1,2, …, N } represents the set of edges between vertices, eijRepresents a point piDirected edges with k neighbor points;
and 2.4, performing MLP convolution processing on the k near neighborhood map obtained after updating in the step 2.3 to obtain local characteristic matrix information of the three-dimensional point cloud object.
Step 3, dynamically updating the local neighborhood map: calculating and updating a k neighbor graph of each layer of the MLP according to the embedding sequence, and extracting updated local feature matrix information;
step 3.1, the MLP comprises two convolution layers, a batch processing normalization layer and an activation layer, wherein the sizes of convolution kernels of the two convolution layers are 64 and 64 from left to right, and output point cloud data of each layer of the MLP network is
l represents the l-th layer of the MLP network;
step 3.2, obtaining a different output k near-neighborhood graph G for each MLP layer according to the output point cloud data of each MLP network layerl=(Vl,El),GlK-neighborhood graph, V, representing the output of the l-th layerlRepresents GlA set of corresponding vertices is created and is,Elrepresents GlA set of edges between corresponding vertices;
and 3.3, acquiring local feature matrix information of each layer according to the k near neighborhood graph output by each layer.
Step 4, point cloud feature transformation: designing a new T-Net mini network, inputting the updated local feature matrix information into the new T-Net network for coordinate alignment, and ensuring the rotation invariance of feature points;
the new designed T-Net network is characterized in that a regularization item shown in a formula (3) is added in the softmax training loss of a conventional T-Net network framework, and a characteristic transformation matrix is limited to be close to an orthogonal matrix;
in the formula (3), A is an affine matrix predicted by the T-Net network, so that input information cannot be lost by orthogonal transformation, I is an identity matrix, and F represents a point cloud dimension.
And 5, fusing local features and global features: and carrying out MaxPolling maximum pooling operation on the aligned and updated local feature matrix information, and processing the global feature information obtained by the operation through a plurality of MLPs to obtain the category fraction of each object, thereby realizing the part-level segmentation of the three-dimensional object.
Step 5.1, using the aligned and updated local feature matrix information as input, and performing convolution operation processing through 3 spatial _ transform modules to respectively obtain three local feature information nets1,net2,net3The spatial _ transform module is shown in fig. 2, wherein the spatial _ transform module is composed of an edge convolution module and a plurality of MLP multi-layer perceptron modules;
step 5.2, respectively converting three local features corresponding to the 3 spatial _ transform modules: net1,net2,net3Splicing, and outputting out;
step 5.3, carrying out MaxPolling maximum pooling operation on the output out data to obtain the global characteristics of the current layer;
step 5.4, splicing the global features of the current layer with the tag information of the input point cloud in the data set to obtain global feature global _ feature of the whole network;
step 5.5, global _ feature and net of global feature1,net2,net3Splicing is carried out;
and 5.6, carrying out MLP multi-layer perceptron processing on the data spliced in the step 5.5 to obtain the category score conditions of all the objects, and realizing the part-level segmentation of the three-dimensional objects as shown in FIG. 3.
In the training process, the cross entropy loss function is used for learning parameters, so that the model reaches a convergence state, and the error of the predicted value of the model is reduced. As shown in fig. 4, fig. 4(a) shows a loss curve during the training process, fig. 4(b) shows an accuracy curve, and as the accuracy of the training set increases with the increase of epoch, loss steadily decreases, which indicates that the learning model of the present invention has good adaptability to the training set.
According to the method, due to the fact that the point cloud is lack of topological information essentially, the graph convolution neural network can run on the graph structure directly and can capture dependency relations in the graph by means of information transfer among nodes in the graph, and aiming at the limitation of local feature extraction in the PointNet framework, the method utilizes the graph convolution neural network to extract features of a central point and edge vectors of the central point and k adjacent domain points of the central point to obtain local features of the point cloud, and the problem that the PointNet network cannot extract the local structure is effectively solved.