Pose state-based laser radar point cloud inter-frame encoding and decoding method
Technical Field
The invention belongs to the technical field of image coding, and particularly relates to a pose state-based laser radar point cloud inter-frame coding and decoding method.
Background
The point cloud is a massive point set expressing the target space distribution and the target surface specificity under the same space reference system. The point cloud is generally composed of two parts, namely geometric information and attribute information, wherein the geometric information refers to position information of a three-dimensional space, and the attribute information comprises reflectivity, color and the like. Because the point cloud can accurately express information of a scene or an object, the point cloud is widely applied to the fields of intelligent driving, virtual reality, augmented reality, map modeling and the like. However, the data volume of the point cloud is huge, sparsity and disorder are very challenging to store and transmit, so that the exploration and research of an efficient point cloud coding method are very important.
In the continuous point cloud sequence, the interval time between two adjacent frames of point clouds is short, and the laser radar only moves a small distance. Therefore, the adjacent frame point clouds have similar structures with a large range in space, objects in the scene only undergo small-range position variation, and a large amount of redundancy exists in the time dimension. The Chinese patent with publication number CN111899152A discloses a point cloud data compression method based on projection and video stitching, wherein the method is particularly used for analyzing geometrical attribute characteristics of point cloud data to determine a corresponding projection strategy and an optimal projection angle, a group of two-dimensional pictures with spatial correlation in the same angle are obtained by multiple projections of a projection area with the same projection angle, and a group of two-dimensional pictures are stitched into a video file, but the method is suitable for dense point cloud data and has poor universality in various point cloud types.
Disclosure of Invention
In order to solve the technical problems, the invention provides a pose state-based laser radar point cloud inter-frame coding method, which can reduce time domain redundancy in a point cloud sequence and improve data compression performance by means of pose relations of adjacent frame point clouds.
The technical scheme of the invention is as follows:
a laser radar point cloud inter-frame coding method based on pose state comprises the following steps:
The coding process comprises the steps of mapping the geometric information of the three-dimensional point cloud into a two-dimensional distance map, obtaining a pose state, calculating a transformation matrix M through the pose state, calculating an I frame and the transformation matrix M to obtain a predicted distance map, obtaining residual coding information and I frame coding information about a P frame through the predicted distance map, and synthesizing the transformation matrix M and the coding information into an output code stream;
And the decoding process sequentially solves the transformation matrix, the residual error coding information and the reference point cloud coding information from the code stream to obtain the original distances of the reference point cloud and the point cloud to be coded on the two-dimensional distance map, and finally, the distance map points the cloud to recover the three-dimensional point cloud.
Further, the pose relationship is obtained according to the IMU information in the Kitti dataset.
Further, the reference point cloud is an I-frame point cloud and a P-frame point cloud of the point cloud to be encoded, and the point cloud sequence encoding is performed in the form of "ipppippp.
Further, the specific steps of the encoding process are as follows:
S1, dividing a point cloud sequence into a reference point cloud and a point cloud to be encoded, mapping geometric information of the reference point cloud and the point cloud to be encoded into a two-dimensional distance map, respectively obtaining an I frame original distance map of the reference point cloud and a P frame original distance map with the point cloud to be encoded, and encoding the point cloud sequence in an IPPPPPP.
S2, acquiring a pose state according to IMU information in Kitti data sets, and calculating a transformation matrix M through the pose state;
S3, carrying out formula operation on the reference point cloud I frame and the transformation matrix M to obtain a predicted distance map of the point cloud P frame to be encoded;
S4, making a difference value between the predicted distance map obtained in the step S3 and the original distance map of the P frame in the step S1 to obtain a predicted residual error;
S5, carrying out quantization processing and JPEG-LS lossless coding on the prediction residual error obtained in the step S4 to obtain residual error coding information;
S6, the original distance map of the I frame in the step S1 is also encoded by JPEG-LS to obtain I frame encoding information;
s7, synthesizing the residual coding information in the step S5 and the I frame coding information in the step S6 into an output code stream by using the transformation matrix M in the step S2.
Further, the specific steps of the decoding process are as follows:
m1, after receiving a code stream, taking out I frame coding information in the code stream, and obtaining an I frame distance map through JPEG-LS decoding;
M2, after receiving the code stream, taking out a transformation matrix M in the code stream, and calculating the transformation matrix M information by combining the I frame distance map obtained in the step M1 to obtain a predicted distance map;
M3, after receiving the code stream, taking out residual error coding information in the code stream, and performing inverse quantization processing and JPEG-LS decoding on the residual error coding information to obtain a predicted residual error;
m4, reconstructing a P frame distance map by combining the prediction residual error obtained through M3 and the prediction distance map obtained through M2;
and M5, converting the I frame distance map obtained in the step M1 and the P frame distance map obtained in the step M4 into point cloud, and finally recovering the two-dimensional distance map into three-dimensional point cloud.
Further, the formulas for operating the reference point cloud I frame and the transformation matrix M in the encoding process and the decoding process are as follows:
Where I is any point in the I-frame point cloud, p' is any point in the predicted point cloud, M is the transformation matrix, R 3×3 is the rotation matrix, and T 3×1 is the translation matrix.
Compared with the prior art, the invention has the beneficial effects that:
1. The invention creatively realizes the mapping of the geometric information of the three-dimensional point cloud into the two-dimensional distance graph, simultaneously divides the point cloud sequence into the reference point cloud and the point cloud to be encoded, designs an inter-frame prediction method of the point cloud to be encoded by utilizing the pose relation of the reference point cloud and the point cloud to be encoded, predicts the point cloud to be encoded by combining a transformation matrix, only needs the difference between encoding and original point cloud of the point cloud to be encoded, realizes the data compression of the continuous point cloud sequence in the whole encoding and decoding process, reduces the data quantity required by transmission, can reduce the time domain redundancy of the point cloud sequence and improves the compression performance of the data;
2. the method provided by the invention has higher coding performance when coding and decoding the laser radar point cloud sequence.
Drawings
FIG. 1 is a schematic diagram of the encoding process of the present invention;
fig. 2 is a schematic diagram of a decoding process according to the present invention.
Detailed Description
The invention is further described below with reference to the preferred drawings and examples.
The invention provides a pose state-based laser radar point cloud inter-frame encoding and decoding method, which comprises the steps of firstly mapping geometric information of a three-dimensional point cloud into a two-dimensional distance graph in an encoding process, dividing a point cloud sequence into a reference point cloud and a point cloud to be encoded, designing an inter-frame prediction method of the point cloud to be encoded by utilizing the pose relation between the reference point cloud and the point cloud to be encoded, finally quantizing and encoding prediction residues, synthesizing a transformation matrix, residual error encoding information and the reference point cloud encoding information into an output code stream, and sequentially solving the transformation matrix, the residual error encoding information and the reference point cloud encoding information from the code stream in the decoding process to obtain original distances of the reference point cloud and the point cloud to be encoded on the two-dimensional distance graph, and finally turning the point cloud into the three-dimensional point cloud by the distance graph.
The encoding process comprises the following steps:
s1, dividing a point cloud sequence into a reference point cloud and a point cloud to be encoded, mapping geometric information of the reference point cloud and the point cloud to be encoded into a two-dimensional distance graph to respectively obtain an I frame original distance graph of the reference point cloud and a P frame original distance graph with the point cloud to be encoded, predicting the P frame point cloud by combining the I frame point cloud with pose information, wherein the number of frames of the point cloud to be encoded predicted by one frame of the reference point cloud is determined by a parameter n, and the red setting parameter n=3 in the embodiment, namely, the point cloud sequence is encoded in an IPPPP.
S2, acquiring pose states according to IMU information in Kitti data sets, and calculating a transformation matrix M through the pose states, wherein the transformation matrix is calculated by using translational acceleration in the IMU informationAnd rotation rateThe transformation matrix is calculated by a rotation matrix R 3×3 and a translation matrix T 3×1, wherein the translation matrix is calculated as shown in a formula 1Is toPerforming translation parameters obtained by calculation of a first-order Longer-Kutta (Runge-Kutta) numerical method;
Similarly, the rotation matrix R 3×3 is calculated as shown in equation 2, Is toThe rotation parameters obtained by calculation by a first-order Longer-Kutta (Runge-Kutta) numerical method are calculated;
Finally, according to the translation matrix T 3×1 and the rotation matrix R 3×3, a transformation matrix M is obtained, as shown in a formula 3;
S3, carrying out formula operation on the reference point cloud I frame and the transformation matrix M to obtain a predicted distance map of the point cloud P frame to be encoded;
S4, making a difference value between the predicted distance map obtained in the step S3 and the original distance map of the P frame in the step S1 to obtain a predicted residual error;
S5, carrying out quantization processing and JPEG-LS lossless coding on the predicted residual obtained in the step S4 to obtain residual coding information, wherein the operations of relevant quantization processing and JPEG-LS lossless coding are specifically described in the following papers:
Quantization process :TU C,TAKEUCHI E,CARBALLO A,et al.Real-time streaming point cloud compression for 3d LiDAR sensor using U-NET[J].IEEE Access,2019,7:113616-113625.
JPEG-LS:MARKOS E,PAPADONIKOLAKIS,ATHANASIOS P,et al.Efficient high-performance implementation of JPEG-LS encoder[J].Journal of Real-Time Image Processing,2008,3(4):303-310.
S6, the original distance map of the I frame in the step S1 is also encoded by JPEG-LS to obtain I frame encoding information;
s7, synthesizing the residual coding information in the step S5 and the I frame coding information in the step S6 into an output code stream by using the transformation matrix M in the step S2.
The decoding process comprises the following specific steps:
m1, after receiving a code stream, taking out I frame coding information in the code stream, and obtaining an I frame distance map through JPEG-LS decoding;
M2, after receiving the code stream, taking out a transformation matrix M in the code stream, and calculating the transformation matrix M information by combining the I frame distance map obtained in the step M1 to obtain a predicted distance map;
M3, after receiving the code stream, taking out residual error coding information in the code stream, and performing inverse quantization processing and JPEG-LS decoding on the residual error coding information to obtain a predicted residual error;
m4, reconstructing a P frame distance map by combining the prediction residual error obtained through M3 and the prediction distance map obtained through M2;
and M5, converting the I frame distance map obtained in the step M1 and the P frame distance map obtained in the step M4 into point cloud, and finally recovering the two-dimensional distance map into three-dimensional point cloud.
The formula for operating the reference point cloud I frame and the transformation matrix M in the encoding process and the decoding process is as follows:
Where I is any point in the I-frame point cloud, p' is any point in the predicted point cloud, M is the transformation matrix, R 3×3 is the rotation matrix, and T 3×1 is the translation matrix.
In the embodiment, the four scene sequences of kitti data sets are subjected to experimental tests by using the method, and the experimental results are shown in a table 1, wherein each scene sequence in the experiment selects 100 frames for encoding and decoding, and data in the table is an experimental result of averaging one-frame point cloud, wherein residual quantization parameter selection is minimum;
Where CR represents the compression rate, size coding represents the encoded output stream Size, size original represents the original data Size, RMSE is root mean square error, max=13000;
Table 1 experimental test results
From the test results, the method provided by the invention has higher coding performance when coding and decoding the laser radar point cloud sequence.
Although the present invention has been described in terms of the preferred embodiments, it is not intended to be limited to the embodiments, and any person skilled in the art can make any possible variations and modifications to the technical solution of the present invention by using the methods and technical matters disclosed above without departing from the spirit and scope of the present invention, so any simple modifications, equivalent variations and modifications to the embodiments described above according to the technical matters of the present invention are within the scope of the technical matters of the present invention. The foregoing description is only of the preferred embodiments of the invention, and all changes and modifications that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.