CN119366184A

CN119366184A - A point cloud inter-frame compensation method, encoding and decoding method, device and system

Info

Publication number: CN119366184A
Application number: CN202280096827.XA
Authority: CN
Inventors: 马展; 王剑强; 魏红莲
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2025-01-24
Also published as: WO2024011427A1; TW202408237A

Abstract

A point cloud inter-frame compensation method, encoding and decoding method, device and system. A sparse convolution-based downsampler extracts implicit feature data of a reference frame point cloud, embeds the implicit feature information of the reference frame into a current frame point cloud of a corresponding scale through a compensator, and inputs the compensated implicit feature data of the low-scale current frame point cloud into an upsampler and probability predictor based on sparse convolution, outputs the occupancy probability of voxels in a high-scale current frame point cloud, and reconstructs the geometric data of the high-scale current frame point cloud. The disclosed embodiment can directly use the compensation features to reconstruct the current frame, with a simple structure and efficient performance.

Description

Point cloud inter-frame compensation method, encoding and decoding method, device and system

Technical Field

The embodiments of the present disclosure relate to, but are not limited to, point cloud compression techniques, and more particularly, to a point cloud inter-frame compensation method, a codec method, a device, and a system.

Background

The point cloud is a set of irregularly distributed discrete points expressing the spatial structure and surface properties of a three-dimensional object or scene in space, is three-dimensional data, is a set of vectors in a three-dimensional coordinate system, and can represent (x, y, z) three-dimensional coordinates and can also represent attribute information such as color, reflectivity and the like. With the vigorous development of the emerging technologies such as augmented reality, virtual reality, automatic driving and robots, the compact expression of the point cloud data on the three-dimensional space becomes one of the main data forms, but the point cloud data has huge volume, and the direct storage of the point cloud data consumes a large amount of memory, so that the transmission is unfavorable, and the performance of the point cloud compression needs to be continuously improved.

Summary of The Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

An embodiment of the present disclosure provides a point cloud inter-frame compensation method, including:

Based on the geometric data of the current frame point cloud, performing sparse convolution operation accompanied with inter-frame transformation on the implicit characteristic data of the reference frame point cloud,

Obtaining implicit characteristic data of the current frame point cloud according to the result of the sparse convolution operation accompanied with the inter-frame transformation;

The sparse convolution operation accompanied by the inter-frame transformation is to take occupied voxels in the current frame point cloud as the center of a convolution kernel, and carry out convolution operation on implicit characteristic data on adjacent voxels in the reference frame point cloud.

An embodiment of the present disclosure further provides a point cloud decoding method, applied to a point cloud decoder, including:

Decoding the geometric code stream to obtain reconstructed geometric data C _N,C _N of the current frame point cloud of the N-th scale, wherein the geometric data of the current frame point cloud of the first scale is obtained by downsampling the geometric data of the current frame point cloud of the first scale for N-1 times, and N is an integer greater than or equal to 2;

Sequentially obtaining reconstruction geometric data of the current frame point cloud from the N-1 scale to the first scale, wherein the reconstruction geometric data of the current frame point cloud of the i scale is obtained by the following steps:

Obtaining implicit characteristic data of the current frame point cloud of the (i+1) -th scale according to the reconstructed geometric data of the current frame point cloud of the (i+1) -th scale and the implicit characteristic data of the reference frame point cloud of the (i+1) -th scale by the method according to any embodiment of the disclosure;

Performing sparse convolution-based voxel upsampling and probability prediction on implicit characteristic data of the i+1th-scale current frame point cloud to obtain occupation probability P _i of voxels in the i-scale current frame point cloud; reconstruction geometry data for the current frame point cloud for the ith scale is determined according to P _i, i=n-1, N-2.

An embodiment of the present disclosure further provides a point cloud encoding method, applied to a point cloud encoder, including:

Performing N-1 times of voxel downsampling on the geometric data of the current frame point cloud of the first scale to obtain the geometric data of the current frame point cloud of the second scale to the N scale;

And carrying out entropy coding on the geometric data of the current frame point cloud of the N scale.

An embodiment of the present disclosure further provides a point cloud code stream, where the code stream is obtained according to the point cloud coding method described in any embodiment of the present disclosure.

An embodiment of the present disclosure further provides a point cloud inter-frame compensation device, which includes a processor and a memory storing a computer program, where the processor can implement the point cloud inter-frame compensation method according to any embodiment of the present disclosure when executing the computer program.

An embodiment of the present disclosure further provides a point cloud decoder, including a processor and a memory storing a computer program, where the processor can implement the point cloud decoding method according to any embodiment of the present disclosure when executing the computer program.

An embodiment of the present disclosure further provides a point cloud encoder, including a processor and a memory storing a computer program, where the processor can implement the point cloud encoding method according to any embodiment of the present disclosure when executing the computer program.

An embodiment of the present disclosure further provides a point cloud encoding and decoding system, which includes the point cloud encoder described in any embodiment of the present disclosure, and the point cloud decoder described in any embodiment of the present disclosure.

An embodiment of the present disclosure further provides a non-transitory computer readable storage medium storing a computer program, where the computer program, when executed by a processor, is capable of implementing a point cloud inter-frame compensation method according to any embodiment of the present disclosure, or implementing a point cloud decoding method according to any embodiment of the present disclosure, or implementing a point cloud encoding method according to any embodiment of the present disclosure.

Other aspects will become apparent upon reading and understanding the accompanying drawings and detailed description.

Brief description of the drawings

The accompanying drawings are included to provide an understanding of embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain, without limitation, the embodiments.

FIG. 1 is a flow diagram of G-PCC encoding;

FIG. 2 is a flow diagram of G-PCC decoding;

FIG. 3 is a flow chart of a method of point cloud inter-frame compensation according to an embodiment of the present disclosure;

FIGS. 4A and 4B are schematic diagrams of sparse convolution operations with inter-frame transforms in accordance with an embodiment of the present disclosure;

FIG. 5 is a flow chart of a point cloud encoding method according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a point cloud encoding process that may enable lossless compression in accordance with an embodiment of the present disclosure;

FIGS. 7A, 7B and 7C are schematic diagrams of occupancy symbols, implicit characteristic data and occupancy probabilities, respectively, of voxels in a point cloud according to an embodiment of the disclosure;

FIG. 8 is a schematic diagram of a network architecture of a sparse convolution-based downsampler according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a network architecture of a downsampler of sparse convolution according to another embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a network architecture of an upsampler based on sparse convolution in accordance with an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of a network architecture of an upsampler in sparse convolution according to another embodiment of the present disclosure;

FIG. 12 is a schematic diagram of a network architecture of a probability predictor according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of a point cloud decoding process that may implement lossless compression in accordance with an embodiment of the present disclosure;

FIG. 14 is a schematic diagram of a point cloud decoding process that may implement lossy compression in accordance with an embodiment of the present disclosure;

FIG. 15 is a flow chart of a point cloud decoding method according to an embodiment of the present disclosure;

Fig. 16 is a schematic diagram of a point cloud inter-frame compensation device according to an embodiment of the disclosure.

Detailed description of the preferred embodiments

The present disclosure describes a number of embodiments, but the description is illustrative and not limiting, and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described in the present disclosure.

In the description of the present disclosure, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment described as "exemplary" or "e.g." in this disclosure should not be taken as preferred or advantageous over other embodiments. "and/or" herein is a description of association of associated objects, and means that there may be three relationships, for example, a and/or B, and that there may be three cases where a exists alone, while a and B exist together, and B exists alone. "plurality" means two or more than two. In addition, in order to facilitate the clear description of the technical solutions of the embodiments of the present disclosure, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the number and order of execution, and that the words "first," "second," and the like do not necessarily differ.

In describing representative exemplary embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. As will be appreciated by those of ordinary skill in the art, other sequences of steps are possible. Accordingly, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Furthermore, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present disclosure.

The point cloud compression algorithm includes geometrical-based point cloud compression (G-PCC), where geometrical compression is achieved primarily by octree models and/or triangle surface models.

In order to facilitate understanding of the technical solutions provided by the embodiments of the present disclosure, a flow chart of G-PCC encoding and a flow chart of G-PCC decoding are provided first. It should be noted that, the flow chart of G-PCC encoding and the flow chart of G-PCC decoding described in the embodiments of the present disclosure are only for more clearly illustrating the technical solutions of the embodiments of the present disclosure, and do not constitute a limitation of the embodiments of the present disclosure. As can be appreciated by those skilled in the art, with the evolution of the point cloud compression technology and the appearance of new service scenarios, the technical solution provided by the embodiments of the present disclosure is equally applicable to a point cloud compression architecture similar to G-PCC, and the point cloud compressed by the embodiments of the present disclosure may be a point cloud in a video, but is not limited thereto.

In the point cloud G-PCC encoder framework, after the point cloud input into the three-dimensional image model is divided into slices (slices), each slice is independently encoded.

In the flow chart of the G-PCC coding shown in fig. 1, the method is applied to a point cloud encoder, and for point cloud data to be coded, the point cloud data is divided into a plurality of slices through slice division. In each slice, the geometric information and attribute information of the point cloud are encoded separately. In the geometric coding process, the geometric information is subjected to coordinate transformation, so that the point clouds are all contained in a bounding box (bounding box), then quantization is performed, the quantization mainly plays a role of scaling, and as the quantization rounding is performed, the geometric information of a part of the point clouds is the same, whether to remove the repeated points can be determined based on parameters, and the process of quantizing and removing the repeated points is also called as a voxelization process. The tuning box is then octree-divided. In the octree-based geometric information coding flow, the bounding box is divided into 8 subcubes in an octave way, the subcubes which are not empty (including points in the point cloud) are continuously divided into octaves until the leaf nodes obtained by division are 1x1x1 unit cubes, the division is stopped, the points in the leaf nodes are subjected to arithmetic coding, and a binary geometric bit stream, namely a geometric code stream, is generated. In the geometric information encoding process based on triangular patch set (trisoup), the octree is also divided first, but unlike the geometric information encoding process based on octree, the trisoup does not need to divide the point cloud step by step into unit cubes with the side length of 1x1x1, but divides the point cloud into sub-blocks (blocks) with the side length of W, and stops dividing, based on the surface formed by the distribution of the point cloud in each block, at most twelve intersecting points (vertex) generated by the surface and twelve sides of the block are obtained, and the vertex is arithmetically encoded (surface fitting is performed based on the intersecting points), so as to generate a binary geometric bit stream (i.e., geometric bit stream). vertex is also used for implementation in the process of geometric reconstruction, which is used when encoding the attributes of the point cloud.

In the attribute encoding process, color conversion is performed to convert color information (i.e., attribute information) from an RGB color space to a YUV color space. The point cloud is then recoloured with the reconstructed geometric information such that the uncoded attribute information corresponds to the reconstructed geometric information. In the color information encoding process, there are two main transformation methods, namely, a distance-based lifting transformation depending on Level of Detail (LOD) division, and a direct transformation of Region ADAPTIVE HIERARCHAL Transform (RAHT), which are both the methods, the color information is transformed from a spatial domain to a frequency domain, a high-frequency coefficient and a low-frequency coefficient are obtained through transformation, finally, the coefficients are quantized (i.e., quantized coefficients), finally, the geometry encoded data subjected to octree division and surface fitting and the quantized coefficient processing attribute encoded data are processed to perform slice synthesis, and then, the vertex coordinates (i.e., arithmetic encoding) of each block are sequentially encoded to generate a binary attribute bit stream, i.e., attribute bit stream.

The flow block diagram of the G-PCC decoding shown in FIG. 2 is applied to a point cloud decoder. The decoder acquires the binary code stream and independently decodes the geometric bit stream (i.e., geometric code stream) and the attribute bit stream in the binary code stream. And obtaining the attribute information of the point cloud by arithmetic decoding, inverse quantization, inverse lifting based on LOD or inverse transformation based on RAHT and inverse color conversion when the attribute bit stream is decoded, and recovering a three-dimensional image model of the point cloud data based on the geometric information and the attribute information.

The neural network and deep learning techniques may also be applied to point cloud geometry compression techniques, such as, for example, volumetric model compression techniques based on three-dimensional convolutional neural networks (3D Convolution Neural Network,3D CNN), compression techniques that directly utilize neural networks based on Multi-Layer Perceptron (MLP) for point coordinate sets, compression techniques that utilize MLP or 3D CNN for probability estimation and entropy coding for node symbols of octree, compression techniques based on three-dimensional sparse convolutional neural networks, and so on. The point cloud can be divided into sparse point cloud and dense point cloud according to the density of the points, wherein the sparse point cloud has the characteristics of large representation range and sparse distribution in a three-dimensional space, and can represent a scene, and the dense point cloud has the characteristics of small representation range and dense distribution, and can represent an object. The compression performance of the compression technology on the two point clouds tends to have larger difference, and the compression technology has better performance on dense point clouds and worse performance on sparse point clouds.

In order to improve performance of a neural network-based encoding and decoding method on sparse point clouds, the embodiment of the disclosure provides a point cloud encoding and decoding method based on point cloud inter-frame compensation, which can realize lossy compression or lossless compression of point clouds. The embodiment of the disclosure takes the encoding and decoding of the point cloud geometric information as an example, and can also be used for encoding and decoding of the point cloud attribute information.

When the encoding method of the embodiment of the disclosure is applied to the encoding architecture of the G-PCC as shown in fig. 1, the encoding process (such as octree partitioning, surface fitting, etc.) after the voxelization is completed can be replaced, so as to obtain the geometric code stream. When the decoding method of the embodiment of the disclosure is applied to a decoding architecture of G-PCC as shown in fig. 2, decoding processing (such as octree synthesis, surface fitting, etc.) on a geometric code stream before the reverse coordinate transformation can be replaced, so as to obtain reconstructed geometric data of the point cloud. The codec method of the embodiment of the present disclosure may also be used in other point cloud codec frameworks other than G-PCC.

Unlike the codec flow of fig. 1 and 2, the codec method of the present embodiment is an inter-frame codec method based on a neural network, that is, the reference frame point cloud is used to assist in the encoding and decoding of the current frame point cloud. The reference frame point cloud may be a previous frame point cloud to the current frame point cloud, but is not limited thereto.

Before encoding the geometric data of the point cloud, voxelization of the geometric information of the point cloud needs to be completed. After voxelization, the point cloud is presented in the form of a voxel grid. A voxel is the smallest unit in a voxel grid, one point in the point cloud being an occupied voxel (i.e. a non-empty voxel), while an unoccupied voxel (i.e. an empty voxel) indicates that there is no point at that location. The geometrical data (i.e. geometrical information) of the point cloud may have different representations. In one example, the geometric data of the point cloud may be represented by the placeholder symbols of voxels in the point cloud, with occupied voxels marked 1 and unoccupied voxels marked 0, resulting in a hierarchical representation of the binary symbol sequence. The occupation symbol of the voxels in the point cloud can also be used as the characteristic data on the voxels to be input into the neural network for operation. In other examples, the geometric data of the point cloud may also be represented in the form of a sparse tensor, such as coordinate data represented as all points in the point cloud arranged in a contracted order. The different representations of the point cloud geometry data may be converted to each other.

In order to assist the encoding and decoding of the current frame point cloud with the reference frame point cloud, implicit feature data (latent feature or latent representation), i.e., feature data extracted through a neural network, may be extracted from the geometric data of the reference frame point cloud. However, the implicit characteristic data of the reference frame point cloud is directly used for reconstructing the geometric data of the current frame point cloud, so that the effect is not satisfactory.

An embodiment of the present disclosure provides a point cloud inter-frame compensation method to reconstruct data (geometric data or attribute data) of a current frame point cloud by using implicit feature data of a reference frame point cloud more effectively, where the point cloud inter-frame compensation method may be applied to a point cloud decoder, and may also be applied to a point cloud encoder when performing lossless compression.

As shown in fig. 3, the method includes:

step 110, based on the geometric data of the current frame point cloud, performing sparse convolution operation accompanied with inter-frame transformation on the implicit characteristic data of the reference frame point cloud;

And 120, obtaining implicit characteristic data of the current frame point cloud according to the result of the sparse convolution operation accompanied with the inter-frame transformation.

The traditional sparse convolution operation for the point cloud geometric data takes each occupied voxel in the point cloud of the current frame as the center of a convolution kernel, and carries out one-time convolution operation on the data on the adjacent voxels in the point cloud of the current frame. In contrast, the implicit feature data of the reference frame point cloud is converted into the implicit feature data of the current frame point cloud. The present embodiment uses sparse convolution operation accompanied by inter-frame transformation, that is, when performing convolution operation, the occupied voxel in the current frame point cloud is taken as the center core of the convolution kernel, and the data of the convolution operation is implicit characteristic data on the adjacent voxels of the reference frame point cloud.

The sparse convolution operation with the inter-frame transform has three dimensions. For ease of representation, however, fig. 4A and 4B only show two of these dimensions. The top of fig. 4A represents voxels in the current frame point cloud and the bottom represents voxels in the reference frame point cloud, wherein the black boxes represent occupied voxels. The top of fig. 4B shows the points in the current frame point cloud and the bottom shows the points in the reference frame point cloud. The points in the point cloud and the occupied voxels are only different representations. It should be understood that the voxels of the current frame point cloud and the voxels of the reference frame point cloud are shown separately in the figure for ease of representation. After the voxelization is completed, the voxels of the current frame point cloud and the voxels of the reference frame point cloud are located in the same voxel space. The voxels in the current frame point cloud are also voxels in the reference frame point cloud.

Fig. 4A and 4B show the case of one convolution operation. The center of the convolution kernel is one occupied voxel of the current frame point cloud (see figure 4A and figure 4B above), while the convolved data is implicit feature data on neighboring voxels of the reference frame point cloud (see figure 4A and figure 4B below). The size of the convolution kernel is illustrated as 3 x 3, only two dimensions of which are shown. The adjacent voxels of the reference frame point cloud are voxels covered by the convolution kernel, and may or may not include the voxel where the convolution kernel center is located.

In one example of this embodiment, the implicit feature data is extracted from the geometric data, and when the convolution operation is performed, the implicit feature data on the voxel where the center of the convolution kernel is located is set to 1, which is equivalent to the feature data on the voxel using the current frame point cloud. In other examples, implicit feature data of the reference frame point cloud on the voxel where the center of the convolution kernel is located may also be used to participate in the convolution operation.

And carrying out convolution operation on implicit characteristic data of adjacent voxels of the reference frame point cloud once by taking each occupied voxel in the current frame point cloud as the center of a convolution kernel, so as to obtain the implicit characteristic data of the current frame point cloud. That is, for each point of the current frame point cloud, the implicit feature data of the neighboring points in the reference frame point cloud is aggregated into the implicit feature data of that point.

In one example of this embodiment, the result of the sparse convolution operation accompanying the inter-frame transformation may be determined as implicit feature data of the current frame point cloud. In other examples, other operations such as sparse convolution operations without inter-frame transforms, linear transforms, etc., may be added before and after the sparse convolution operations with inter-frame transforms. Here, the sparse convolution without particular limitation is not added, and means a sparse convolution not accompanied with an inter-frame transformation.

In one example of the present embodiment, the convolution kernel has a size of 5 or more in all three dimensions, but the present disclosure is not limited thereto. The size of the convolution kernel may actually need to be adjusted.

In one example of this embodiment, the above-described inter-frame compensation may be implemented using a compensator (may also be referred to as a predictor), and the geometric data of the current frame point cloud and the implicit characteristic data of the reference frame point cloud are taken as inputs of the compensator, and the implicit characteristic data of the current frame point cloud is obtained from outputs of the compensator.

The point cloud inter-frame compensation method of the embodiment can transform the implicit characteristic data of the point cloud of the reference frame into the implicit characteristic data of the point cloud of the current frame, so that the characteristic information of the point cloud of the reference frame can be used when the geometric data of the point cloud of the current frame is reconstructed, the relevance of the point in the point cloud of the current frame and the neighbor point in the point cloud of the reference frame on the data is effectively utilized, and the accuracy of the reconstruction of the point cloud of the current frame is improved.

An embodiment of the present disclosure provides a lossy point cloud encoding method, applied to a point cloud encoder, as shown in fig. 5, including:

Step 210, performing N-1 times of voxel downsampling on the geometric data of the current frame point cloud of the first scale to obtain geometric data of the current frame point cloud of the second scale to the nth scale, wherein N is an integer greater than or equal to 2;

And 220, performing entropy coding on the geometric data of the current frame point cloud of the N scale.

The current frame point cloud of the first scale is the current frame point cloud of the original scale.

In this embodiment, when performing voxel downsampling on geometric data of the first scale point cloud, the voxel downsampling may be implemented in a pooling manner. If a maximum pooling layer with the step length of 2 multiplied by 2 is adopted, 8 voxels of the first-scale point cloud are combined into 1 voxel of the second-scale point cloud, so that one voxel downsampling is realized, and the size of the point cloud in three dimensions is reduced to half of the original size in each voxel downsampling. The two scale point clouds may be referred to as high scale point clouds of larger size and may be referred to as low scale point clouds of smaller size. Among the point clouds obtained by N-1 times of voxel downsampling, the N-th scale point cloud is the point cloud with the smallest scale, the data volume is the smallest, and the bandwidth required by transmission is small.

Referring to fig. 6, the third scale point cloud includes 2×2×1 voxels, the second scale point cloud includes 4×4×2 voxels, and the first scale point cloud includes 8×8×4 voxels. Only the occupied voxels in each scale point cloud are shown with solid cubic blocks in the figure. The point cloud of fig. 3 is merely exemplary, and an actual point cloud typically includes many more voxels. The geometric data of the low-scale point cloud and the geometric data of the high-scale point cloud have a certain degree of correlation, for example, the surrounding of an occupied voxel in the low-scale point cloud is the occupied voxel (for example, when the voxel is located in the middle of an object), and after the voxel is decomposed into a plurality of voxels in the high-scale point cloud, the plurality of voxels have a high probability of being occupied voxels. These correlations may be manifested by features extracted by the neural network.

The entropy encoding of this and other embodiments may employ Context-based adaptive binary arithmetic coding (CABAC: context-based Adaptive Binary Arithmetic Coding) and other lossless encoding methods.

In the embodiment, the geometric data of the current frame point cloud of the first scale is subjected to N-1 times of voxel downsampling, the geometric data of the current frame point cloud of the second scale to the N scale are obtained, and then entropy coding is carried out on the geometric data of the current frame point cloud of the N scale. The geometric data code of the current frame point cloud with the minimum dimension is written into the code stream. After the decoding end decodes the geometric data of the current frame point cloud with the minimum scale, interframe compensation is needed to be carried out by combining with the implicit characteristic data of the reference point cloud, and the reconstructed geometric data of the current frame point cloud with the first scale is obtained scale by scale. The embodiment can realize lossy compression and has small required bandwidth.

An embodiment of the disclosure further provides a lossless point cloud encoding method.

In the encoding of this embodiment, the geometric data of the current frame point cloud of the first scale is also required to be downsampled N-1 times to obtain the geometric data of the current frame point cloud of the second to nth scales, and the geometric data of the current frame point cloud of the nth scales is entropy encoded. On this basis, the present embodiment also needs to determine the encoded data of the current frame point cloud from the N-1 scale to the first scale and perform entropy encoding.

Wherein, the coded data X _i of the current frame point cloud of the i-th scale is determined in the following manner, i=n-1, N-2..1:

According to the geometrical data of the i+1th-scale current frame point cloud and the implicit characteristic data of the i+1th-scale reference frame point cloud, according to the point cloud inter-frame compensation method in any embodiment of the disclosure, the implicit characteristic data of the i+1th-scale current frame point cloud is obtained;

performing sparse convolution-based voxel upsampling and probability prediction on implicit characteristic data of the current frame point cloud of the (i+1) -th scale to obtain the occupation probability of voxels in the current frame point cloud of the (i) -th scale;

And entropy coding the occupation symbol of the voxel in the current frame point cloud of the ith scale according to the occupation probability of the voxel in the current frame point cloud of the ith scale, and generating coding data X _i of the current frame point cloud of the ith scale.

Fig. 6 is a schematic diagram of the lossless encoding method of the present embodiment, which is one example of n=3. As shown in the figure, the encoding end needs to obtain, by entropy encoding, encoded data X ₁ for reconstructing the geometric data of the first-scale point cloud and encoded data X ₂ for reconstructing the geometric data of the second-scale point cloud, in addition to entropy encoding the geometric data of the current frame point cloud of the third scale.

Taking the second scale of the current frame point cloud in fig. 6 as an example. In order to obtain the encoded data X ₂, firstly, performing inter-frame compensation according to the geometric data of the current frame point cloud of the second scale and the implicit characteristic data of the reference frame point cloud of the third scale, and obtaining the implicit characteristic data of the current frame point cloud of the third scale through sparse convolution operation accompanied with inter-frame transformation. And then carrying out voxel up-sampling and probability prediction based on sparse convolution on implicit characteristic data of the current frame point cloud of the third scale to obtain the occupation probability of voxels in the current frame point cloud of the second scale. And entropy coding the occupation symbol of the voxel in the current frame point cloud of the second scale according to the occupation probability of the voxel in the current frame point cloud of the second scale, and generating the coding data of the current frame point cloud of the second scale. The occupation symbol of the voxel in the current frame point cloud of the second scale can be directly obtained or converted according to the geometric data of the current frame point cloud of the second scale.

For unoccupied voxels in the low-scale point cloud, a plurality of voxels in the high-scale point cloud obtained by decomposition of the unoccupied voxels are unoccupied, and probability prediction is not needed. Whereas a number of voxels in the high-scale point cloud resulting from decomposition of occupied voxels in the low-scale point cloud need to be predicted. Assuming that the third scale implicit feature data of the current frame point cloud is shown in fig. 7B, the voxel at the lower left corner is unoccupied, after the sparse convolution-based voxel upsampling is performed, 1 voxel in fig. 7B is decomposed into 8 voxels, and it is uncertain whether the 8 voxels are occupied or not, and probability prediction needs to be performed. The probability prediction result is shown in fig. 7C (only voxels toward the paper side are shown, 4 voxels at the lower left corner are not predicted), and if the estimated occupancy probability for an actually occupied voxel is closer to 1, and the estimated occupancy probability for an actually unoccupied voxel is closer to 0, the prediction is more accurate. The more accurate the prediction is, the less the obtained encoded data is, namely the more accurate the probability prediction is, the better the compression performance of the point cloud geometric information is when the occupation symbol of the voxel in the second scale point cloud is entropy encoded according to the occupation probability of the voxel in the second scale point cloud. It should be noted that the prediction probabilities shown in fig. 7C are only for convenience of description, and are not understood as the results of actual operations.

The occupation probability of the voxels in the current frame point cloud of the second scale may be represented by a sequence of occupation probabilities predicted by each voxel in fig. 7C, and the voxels not performing occupation probability prediction may set the occupation probability to 0. Whereas the placeholder symbols of voxels in the second-scale current frame point cloud may be represented by a sequence of placeholder symbols for each voxel in fig. 7A. The two sequences are input into an entropy coder for coding, and the coding result is second coding data. Thereby realizing lossless compression of geometric data of the second scale point cloud. Correspondingly, after the decoding end decodes the second encoded data, entropy decoding is carried out on the second encoded data and the occupation probability of the voxels in the current frame point cloud of the second scale, which is predicted in the same way, so that the reconstructed geometric data of the current frame point cloud of the second scale is lossless geometric data.

Fig. 6 also shows a process of obtaining the encoded data X ₁ of the current frame point cloud of the first scale, which is similar to the process of obtaining the encoded data X ₂, and will not be described again.

In an example of this embodiment, the implicit feature data of the i+1th scale reference frame point cloud is obtained by downsampling the geometry data of the first scale reference frame point cloud i times by voxels based on sparse convolution. As shown in fig. 6, the implicit feature data of the third-scale reference frame point cloud is obtained by performing voxel downsampling on the first-scale reference frame point cloud twice based on sparse convolution, and the implicit feature data of the second-scale reference frame point cloud is obtained by performing voxel downsampling on the first-scale reference frame point cloud once based on sparse convolution. In another example of this embodiment, the implicit feature data of the i+1th scale reference frame point cloud may also be obtained by downsampling the geometry data of the i-th scale reference frame point cloud 1 time by a voxel based on sparse convolution.

In an example of this embodiment, the sparse convolution based voxel downsampling is achieved by a downsampler comprising a sparse convolution layer with a step size of 2 x 2 and one or more residual layers arranged before and after the sparse convolution layer. Fig. 8 shows an exemplary downsampler comprising, in order, two residual layers, a sparse convolution layer with a step size of 2 x 2, and two residual layers. Each residual layer may be sequentially provided with a plurality of sparse convolution layers (2 in the illustrated example), and an activation function may be set between adjacent sparse convolution layers, where the input of the first sparse convolution layer and the output of the last sparse convolution layer are added to form the output of the residual layer. The residual layer structure of this example is relatively simple, and in other examples, more complex structures may be employed, such as residual layers with multiple branches, and so on.

Compared with the simple voxel downsampling, the voxel downsampling based on sparse convolution not only needs to be performed, but also needs to be performed with sparse convolution operation. Whereas sparse convolution operations are provided with parameters that can be set during training. That is, sparse convolved voxel downsampling has the ability to extract features. Herein, "sparse convolution based voxel downsampling" does not require that the sparse convolution operation and voxel downsampling be implemented through the same network (as a layer). That is, the sparse convolution operation and voxel downsampling may be accomplished simultaneously by a sparse convolution layer of step size 2 x 2, as illustrated in fig. 8. The sparse convolution operation and voxel downsampling may also be implemented separately using different layers, e.g., voxel downsampling is performed by replacing the sparse convolution layer of step size 2 x 2 in the downsampler of figure 8 with the largest pooling layer of step size 2 x 2, whereas sparse convolution operations are performed by the residual layer.

In an example of this embodiment, voxel downsampling based on sparse convolution may also be implemented with a more complex downsampler, as shown in fig. 9, which in turn comprises: a first sparse convolution network, a first self-attention network, a first residual network, a sparse convolution layer with a step size of 2 x 2, a second residual network, a second self-attention network, and a second sparse convolution network; an activation function is provided between the first sparse convolution network and the first self-attention network, and between the first residual network and the sparse convolution layer, the first sparse convolution network and the second sparse convolution network comprising one or more sparse convolution layers. A sparse convolution layer with a step size of 2 x 2 may enable voxel downsampling. The downsampler of the example introduces a self-attention network, has a stronger feature extraction function, can improve the accuracy of probability prediction and improves the efficiency of compression coding.

The present disclosure does not impose any limitation on the structure of the downsampler used to implement sparse convolution-based voxel downsampling.

In an example of this embodiment, the sparse convolution based voxel upsampling is implemented by an upsampler comprising a transposed sparse convolution layer with a step size of 2 x2 and one or more residual layers arranged before and after the transposed sparse convolution layer. Fig. 10 shows an exemplary downsampler comprising, in order, two residual layers, a transposed sparse convolution layer with a step size of 2 x2, and two residual layers. Each residual layer may be sequentially provided with a plurality of sparse convolution layers (2 in the illustrated example), and an activation function may be set between adjacent sparse convolution layers, where the input of the first sparse convolution layer and the output of the last sparse convolution layer are added to form the output of the residual layer.

In another example of this embodiment, sparse convolution based voxel upsampling is implemented using a neural network as shown in fig. 11. The up-sampler sequentially comprises a first sparse convolution network, a first self-attention network, a first residual error network, a transposed sparse convolution layer with the step length of 2 multiplied by 2, a second residual error network, a second self-attention network and a second sparse convolution network, wherein an activation function is arranged between the first sparse convolution network and the first self-attention network and between the first residual error network and the sparse convolution layer, and the first sparse convolution network and the second sparse convolution network comprise one or more sparse convolution layers. A transposed sparse convolution layer with a step size of 2 x2 may achieve voxel upsampling. The up-sampler introduces a self-attention network, has a stronger feature extraction function, can improve the accuracy of probability prediction and improves the efficiency of compression coding.

In an example of this embodiment, the probability prediction is implemented by a plurality of sparse convolution layers and sigmod functions. For example, probability prediction may be implemented using a probability predictor as shown in FIG. 12. The probability predictor comprises 3 sparse convolution layers, 2 activation functions (such as ReLU functions) arranged between adjacent sparse convolution layers, and Sigmod functions arranged on the last layer, wherein Sigmod functions output occupation probabilities of voxels in the deduced point cloud. The range of values of the occupancy probability may be limited to between 0 and 1. The sparse convolution layer can use SConv K to 1 ³,S1 ³ and C32, the convolution kernel size in three dimensions is 1, the step size is 1, and the channel number is 32.

Referring to fig. 6, the neural network used in the encoding architecture of the present embodiment is a sampler (for implementing voxel downsampling based on sparse convolution), an upsampler (for implementing voxel upsampling based on sparse convolution), a compensator (for implementing inter-frame compensation), and the like. The training can be performed in a scale mode, a point cloud sample during the training of the deep neural network is adopted, and a training loss function can be set to BCE (Binary Cross Entropy) loss, namely, the cross entropy of the occupation probability of the voxels in the i scale point cloud and the actual occupation symbol of the voxels in the i scale point cloud, i=1, 2.

In the point cloud encoding method of the embodiment, when the inter-frame compensation is performed, the motion vectors of the reference frame and the current frame are not estimated first as in the conventional method, and then the compensation is performed according to the motion vectors, but the implicit characteristic data of the point cloud of the reference frame is transformed into the implicit characteristic data of the point cloud of the current frame through sparse convolution operation accompanied with the inter-frame transformation, so that the processing process is simpler and more efficient, and good encoding performance can be obtained.

An embodiment of the present disclosure further provides a point cloud decoding method, applied to a point cloud decoder, as shown in fig. 15, including:

Step 310, decoding the geometric code stream to obtain reconstructed geometric data C _N,C _N of the current frame point cloud of the nth scale, wherein the geometric data of the current frame point cloud of the first scale is obtained by downsampling the geometric data of the current frame point cloud of the first scale for N-1 times;

And 320, sequentially obtaining the reconstructed geometric data of the current frame point cloud from the N-1 scale to the first scale.

The reconstruction geometric data of the current frame point cloud of the ith scale is obtained by the following steps:

Obtaining implicit characteristic data of the i+1-scale current frame point cloud according to the reconstructed geometric data of the i+1-scale current frame point cloud and the implicit characteristic data of the i+1-scale reference frame point cloud according to the point cloud inter-frame compensation method in any embodiment of the disclosure;

The code stream carrying data (e.g., encoded data) associated with the decoding of geometric data is referred to herein as a geometric code stream.

The point cloud decoding method of the present embodiment also has two decoding methods corresponding to lossy compression and lossless compression, respectively, corresponding to the encoding side, but both methods need to perform the processing of steps 310 and 320 described above in the present embodiment. The difference is mainly in the implementation method of the reconstruction geometric data of the current frame point cloud of the ith scale determined according to P _i. The manner in which implicit characteristic data of the i+1th scale reference frame point cloud is acquired may also be different.

In an exemplary embodiment of the present disclosure, a point cloud decoding method is provided that can achieve lossless compression of geometric data.

On the basis of the flow shown in fig. 15, when the code stream is decoded, the embodiment also obtains the encoded data X _i,X _i of the current frame point cloud of the ith scale, which is obtained by entropy encoding the occupation symbol of the voxel in the current frame point cloud of the ith scale according to the occupation probability of the voxel in the current frame point cloud of the ith scale, and in the case of obtaining the encoded data X _i, the embodiment also obtains the occupation symbol of the voxel in the current frame point cloud of the ith scale, namely the reconstructed geometric data of the current frame point cloud of the ith scale, which can be converted into other formats such as coordinates of point cloud points if necessary, according to the occupation probability of the voxel in the current frame point cloud of the ith scale and the encoded data X _i.

In this embodiment, the implicit characteristic data of the i+1th scale reference frame point cloud is obtained by performing i times of voxel downsampling based on sparse convolution on the reconstructed geometric data of the first scale reference frame point cloud, or the implicit characteristic data of the i+1th scale reference frame point cloud is obtained by performing 1 times of voxel downsampling based on sparse convolution on the reconstructed geometric data of the i scale reference frame point cloud. In other examples, the reconstructed geometric data of the i+1th-scale reference frame point cloud may also be used as implicit feature data of the i+1th-scale reference frame point cloud.

In this embodiment, the reconstructed geometric data of the current frame point cloud and the reference frame point cloud of the first scale obtained by decoding by the decoding end should be the original geometric data of the current frame point cloud and the reference frame point cloud of the first scale. The reconstructed geometric data of the current frame point cloud and the reference frame point cloud of other scales, which are obtained by the decoding end, are identical to the geometric data of the current frame point cloud and the reference frame point cloud of each scale, which are obtained by the encoding end performing voxel downsampling on the current frame point cloud and the reference frame point cloud of the first scale.

Fig. 13 is a diagram illustrating a decoding process at the decoding end in the case of lossless compression of the point cloud, taking the case of n=3 as an example. As shown in the figure, the geometric data of the current frame point cloud of the third scale can be obtained through entropy decoding, the geometric data can be directly used as the reconstruction geometric data of the current frame point cloud of the third scale, and point cloud inter-frame compensation is carried out together with the implicit characteristic data of the reference frame point cloud of the third scale, so that the implicit characteristic data of the current frame point cloud of the third scale is obtained. And performing sparse convolution-based up-sampling and probability prediction on implicit characteristic data of the current frame point cloud of the third scale to obtain the occupation probability of voxels in the current frame point cloud of the second scale, and obtaining the reconstruction geometric data of the current frame point cloud of the second scale according to the occupation probability of the voxels in the current frame point cloud of the second scale and the second coding data X ₂ obtained by decoding. The implicit characteristic data of the reference frame point cloud of the third scale is obtained by performing voxel downsampling based on sparse convolution twice through reconstruction geometric data of the reference frame point cloud of the first scale in the figure. Of course, the reconstructed geometric data of the reference frame point cloud of the second scale can also be obtained by performing voxel downsampling based on sparse convolution once.

The process of obtaining the reconstruction geometry data of the current frame point cloud of the first scale is similar, and the reconstruction geometry data of the current frame point cloud of the second scale and the implicit characteristic data of the reference frame point cloud of the second scale are required to be sent to a compensator together to carry out point cloud inter-frame compensation, so as to obtain the implicit characteristic data of the current frame point cloud of the second scale. And performing sparse convolution-based up-sampling and probability prediction on implicit characteristic data of the current frame point cloud of the second scale to obtain the occupation probability of voxels in the current frame point cloud of the first scale, and obtaining the reconstruction geometric data of the current frame point cloud of the first scale according to the occupation probability of the voxels in the current frame point cloud of the first scale and the first coding data X ₂ obtained by decoding. The implicit characteristic data of the reference frame point cloud of the second scale is obtained by performing one-time voxel downsampling based on sparse convolution on the reconstructed geometric data of the reference frame point cloud of the first scale.

In an exemplary embodiment of the present disclosure, a point cloud decoding method for implementing lossy compression of geometric data is provided.

Unlike lossy compression, the present embodiment cannot obtain encoded data X _i by decoding. After the occupation probability of the voxels in the current frame point cloud of the ith scale is obtained, determining the occupation sign of the voxels in the current frame point cloud of the ith scale by a point cloud clipping mode, wherein i=n-1, N-2, and 1. The occupation symbol of the voxel in the current frame point cloud of the ith scale, namely the reconstructed geometric data of the current frame point cloud of the ith scale, can be converted into coordinates of other formats such as point cloud points if needed.

After the occupation probability of the points in the point cloud of a certain scale is predicted, the point cloud clipping can be realized by adopting simple classification. Referring to the examples shown in fig. 7A and 7C, fig. 7A illustrates a case where voxels in the current frame point cloud of the second scale are occupied, and fig. 7C illustrates probability of occupancy of voxels in the current frame point cloud of the second scale obtained by probability prediction. When the classification is used, voxels with the occupation probability not less than a set threshold (e.g., 0.5) are determined as occupied voxels, voxels with the occupation probability less than the set threshold (e.g., 0.5) are determined as unoccupied voxels, and thus the reconstruction geometry data of the point cloud is obtained. However, the point cloud clipping by the two classifications is sometimes not accurate enough.

In order to improve the accuracy of point cloud clipping. The embodiment provides a method for assisting in clipping based on the number of occupied voxels in a point cloud. The number entropy encoding of occupied voxels (i.e., points) in the current frame point cloud of one or more scales to be clipped at the encoding end. The decoding method of the embodiment further comprises the steps of decoding a geometric code stream to obtain the number K _i of occupied voxels in the current frame point cloud of the ith scale, dividing M voxels obtained by decomposing the same voxels in the current frame point cloud of the ith scale into a group when the reconstruction geometric data of the current frame point cloud of the ith scale is determined according to the occupancy probability P _i of the voxels in the current frame point cloud of the ith scale, setting the occupancy probability of M voxels with the highest occupancy probability in each group of voxels as 1, then sequencing the occupancy probabilities of all the voxels in the current frame point cloud of the ith scale, determining the K _i voxel with the highest occupancy probability as the occupied voxels, wherein the value of M is equal to or less than M and can be 8,64. For example, 8 voxels obtained by decomposing the same voxel may be grouped together, and the occupation probability of 1 or2 or 3 voxels having the highest occupation probability in each group of voxels may be set to 1. When decomposing voxels of the low-scale point cloud, the unoccupied voxels do not need to be decomposed, but at least 1 of the decomposed 8 voxels is 1, so that the embodiment groups the voxels, and the occupation probability of at least one voxel with the highest occupation probability in a group is firstly set to be 1, and then the voxels are sequenced, so that the realization mode is simple, the accuracy of point cloud clipping can be obviously improved, and a specific good effect is achieved.

In one example of this embodiment, the implicit characteristic data of the i+1th scale reference frame point cloud is reconstructed geometric data of the i+1th scale reference frame point cloud. In other examples, implicit feature data of the i+1th scale reference frame point cloud may also be obtained by downsampling the reconstructed geometry data of the high scale (e.g., i scale, first scale, etc.) reference frame point cloud 1 or more times based on sparse convolution. The embodiments of the present disclosure are not limited in this regard.

Fig. 14 is a diagram illustrating a decoding process at the decoding end in the case of lossy compression of the point cloud, taking the case of n=3 as an example. As shown in the figure, after the geometric data of the current frame point cloud of the third scale is obtained through entropy decoding, the geometric data can be directly used as the reconstruction geometric data of the current frame point cloud of the third scale, and the geometric data and the implicit characteristic data of the reference frame point cloud of the third scale are sent to a compensator to carry out point cloud inter-frame compensation, so that the implicit characteristic data of the current frame point cloud of the third scale is obtained. And performing sparse convolution-based up-sampling and probability prediction on implicit characteristic data of the current frame point cloud of the third scale to obtain the occupation probability of voxels in the current frame point cloud of the second scale, and performing point cloud clipping according to the occupation probability of the voxels in the current frame point cloud of the second scale to obtain the reconstruction geometric data of the current frame point cloud of the second scale. The implicit characteristic data of the reference frame point cloud of the third scale can use reconstruction geometric data of the reference frame point cloud of the second scale.

The process of obtaining the reconstruction geometry data of the first-scale current frame point cloud is similar, and the reconstruction geometry data of the second-scale current frame point cloud and the implicit characteristic data of the second-scale reference frame point cloud are required to be sent to a compensator together to perform point cloud inter-frame compensation, so that the implicit characteristic data of the second-scale current frame point cloud is obtained. And performing sparse convolution-based up-sampling and probability prediction on implicit characteristic data of the current frame point cloud of the second scale to obtain the occupation probability of voxels in the current frame point cloud of the first scale, and performing point cloud clipping according to the occupation probability of the voxels in the current frame point cloud of the first scale to obtain the reconstruction geometric data of the current frame point cloud of the first scale. The implicit characteristic data of the reference frame point cloud of the second scale can use reconstruction geometric data of the reference frame point cloud of the second scale.

In an example of this embodiment, the sparse convolution based voxel upsampling is implemented by an upsampler comprising a transposed sparse convolution layer with a step size of 2 x 2 and one or more residual layers arranged before and after the transposed sparse convolution layer. See fig. 10. In another example of this embodiment, sparse convolution based voxel upsampling may also be implemented using a neural network as shown in fig. 11.

In an example of this embodiment, the probability prediction is implemented by a plurality of sparse convolution layers and sigmod functions. For example, probability prediction may be implemented using a probability predictor as shown in FIG. 12.

In the point cloud decoding method of the embodiment, when the inter-frame compensation is performed, the motion vectors of the reference frame and the current frame are not estimated first as in the conventional method, and then the compensation is performed according to the motion vectors, but the implicit characteristic data of the point cloud of the reference frame is transformed into the implicit characteristic data of the point cloud of the current frame through sparse convolution operation accompanied with the inter-frame transformation, so that the processing process is simpler and more efficient, and good coding performance can be obtained.

According to the point cloud encoding method and decoding method, the implicit characteristic data of the reference frame point cloud is extracted based on the sparse convolution downsampler, the implicit characteristic information of the reference frame is embedded into the current frame point cloud of the corresponding scale through the compensator, the compensated implicit characteristic data of the low-scale current frame point cloud is input into the sparse convolution-based upsampler and the probability predictor, the occupation probability of voxels in the high-scale current frame point cloud is output, and the reconstruction geometric data of the high-scale current frame point cloud is obtained through entropy encoding or point cloud clipping. The point cloud encoding method and the decoding method of the embodiment of the disclosure are different from the traditional inter-frame residual error encoding method, can directly use compensation characteristics for reconstructing a current frame, and can be simultaneously applied to lossy compression and lossless compression. The neural network used by the method of the embodiment of the disclosure has simple structure and high performance, can be simultaneously applied to lossy compression and lossless compression, and can obtain obvious gain compared with the traditional MPEG G-PCC method and intra-frame compression technology.

The point cloud encoding and decoding method of the embodiment of the disclosure is a dynamic compression algorithm, and the effects of the method are compared with the effects of the traditional MPEG G-PCC method and the simple static compression algorithm as follows:

bpp	MPEG G-PCC	Intra-frame coding	Inter-frame coding	Inter-frame compression gain
8iVFB_vox10	1.029	0.574(-44.2%)	0.512(-49.7%)	-10.8%
Owlii_vox10	0.817	0.592(-27.6%)	0.394(-48.2%)	-33.4%

Bpp in the table indicates the number of bits allocated per pixel at the time of encoding (bitperpixel), 8 iffb_vox10 and Owlii _vox10 are point cloud sequences for testing. As shown in the table, compared with a pure static compression algorithm, the dynamic compression algorithm provided by the embodiment of the disclosure can save 10% -30% of code rate, and compared with a traditional MPEG G-PCC algorithm, the dynamic compression algorithm can save nearly 50% of code rate.

An embodiment of the present disclosure further provides a point cloud inter-frame compensation device, as shown in fig. 16, including a processor 5 and a memory 6 storing a computer program, where the processor 5 can implement the point cloud inter-frame compensation method according to any embodiment of the present disclosure when executing the computer program.

An embodiment of the present disclosure further provides a point cloud decoder, referring to fig. 16, including a processor and a memory storing a computer program, where the processor can implement the point cloud decoding method according to any embodiment of the present disclosure when executing the computer program.

An embodiment of the present disclosure further provides a point cloud encoder, referring to fig. 16, including a processor and a memory storing a computer program, where the processor can implement the point cloud encoding method according to any embodiment of the present disclosure when executing the computer program.

The processor of the above embodiments of the disclosure may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), a microprocessor, etc., and may also be other conventional processors, etc., where the processor may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA), a discrete logic or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, or a combination thereof. That is, the processor of the above-described embodiments may be any processing device or combination of devices that implements the methods, steps, and logic blocks disclosed in embodiments of the invention. If the disclosed embodiments are implemented, in part, in software, the instructions for the software may be stored in a suitable non-volatile computer-readable storage medium and executed in hardware using one or more processors to implement the methods of the disclosed embodiments.

The apparatus and system of the above embodiments of the present disclosure may be implemented based on a computing device such as a terminal or server. The terminals may include mobile terminals such as mobile phones, tablet computers, notebook computers, palm computers, personal digital assistants (Personal DIGITAL ASSISTANT, PDA), portable media players (Portable MEDIA PLAYER, PMP), navigation devices, wearable devices, smart bracelets, pedometers, and fixed terminals such as digital TVs, desktop computers, and the like.

An embodiment of the present disclosure further provides a non-transitory computer readable storage medium storing a computer program, where the computer program, when executed by a processor, is capable of implementing a point cloud inter-frame compensation method according to any embodiment of the present disclosure, or is capable of implementing a point cloud decoding method according to any embodiment of the present disclosure, or is capable of implementing a point cloud encoding method according to any embodiment of the present disclosure.

In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium, and executed by a hardware-based processing unit. The computer-readable medium may comprise a computer-readable storage medium corresponding to a tangible medium, such as a data storage medium, or a communication medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, a computer-readable medium may generally correspond to a non-transitory tangible computer-readable storage medium or a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Moreover, any connection may also be termed a computer-readable medium, for example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be appreciated, however, that computer-readable storage media and data storage media do not include connection, carrier wave, signal, or other transitory (transient) media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk or blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The technical solutions of the embodiments of the present disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated Circuits (ICs), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units (including one or more processors as described above) in combination with suitable software and/or firmware.

Claims

A point cloud inter-frame compensation method, comprising:

based on the geometric data of the current frame point cloud, carrying out sparse convolution operation accompanying the inter-frame transformation on the implicit characteristic data of the reference frame point cloud, and obtaining the implicit characteristic data of the current frame point cloud according to the result of the sparse convolution operation accompanying the inter-frame transformation;

The sparse convolution operation accompanied by the inter-frame transformation is to take occupied voxels in the current frame point cloud as the center of a convolution kernel, and carry out convolution operation on implicit characteristic data on adjacent voxels in the reference frame point cloud.
The method of claim 1, wherein:

And setting the implicit characteristic data on the voxel where the center of the convolution kernel is positioned as 1 when carrying out the convolution operation.
The method of claim 1, wherein:

the size of the convolution kernel in three dimensions is more than or equal to 5.
A point cloud decoding method is applied to a point cloud decoder and comprises the following steps:

Decoding the geometric code stream to obtain reconstructed geometric data C _N,C _N of the current frame point cloud of the N-th scale, wherein the geometric data of the current frame point cloud of the first scale is obtained by downsampling the geometric data of the current frame point cloud of the first scale for N-1 times, and N is an integer greater than or equal to 2;

Sequentially obtaining reconstruction geometric data of the current frame point cloud from the N-1 scale to the first scale, wherein the reconstruction geometric data of the current frame point cloud of the i scale is obtained by the following steps:

Obtaining implicit characteristic data of the i+1th-scale current frame point cloud according to the method of any one of claims 1 to 3 according to the reconstructed geometric data of the i+1th-scale current frame point cloud and the implicit characteristic data of the i+1th-scale reference frame point cloud;

Performing sparse convolution-based voxel upsampling and probability prediction on implicit characteristic data of the i+1th-scale current frame point cloud to obtain occupation probability P _i of voxels in the i-scale current frame point cloud; reconstruction geometry data for the current frame point cloud for the ith scale is determined according to P _i, i=n-1, N-2.
The method of claim 4, wherein:

The implicit characteristic data of the i+1st scale reference frame point cloud is reconstruction geometric data of the i+1st scale reference frame point cloud.
The method of claim 4, wherein:

The implicit characteristic data of the i+1th scale reference frame point cloud is obtained by performing voxel downsampling on the reconstruction geometrical data of the first scale reference frame point cloud for i times based on sparse convolution, or

The implicit characteristic data of the reference frame point cloud with the (i+1) th scale is obtained by performing voxel downsampling on the reconstructed geometric data of the reference frame point cloud with the (i) th scale for 1 time based on sparse convolution.
The method of claim 4, wherein:

The determining the reconstruction geometric data of the point cloud of the current frame of the ith scale according to P _i comprises the following steps:

And obtaining reconstruction geometric data of the current frame point cloud of the ith scale through point cloud clipping according to the occupation probability of the voxels in the current frame point cloud of the ith scale.
The method of claim 7, wherein:

Decoding the geometric code stream to obtain the number K _i of occupied voxels in the current frame point cloud of the ith scale;

The determining the reconstruction geometric data of the point cloud of the current frame of the ith scale according to P _i comprises the following steps:

Dividing M voxels obtained by decomposing the same voxel in the current frame point cloud of the ith scale into a group, setting the occupation probability of M voxels with the highest occupation probability in each group of voxels as 1, then sequencing the occupation probabilities of all the voxels in the current frame point cloud of the ith scale, and determining the K _i voxels with the highest occupation probability as occupied voxels, wherein M < M is more than or equal to 1.
The method of claim 4, wherein:

Decoding a code stream to obtain encoded data X _i,X _i of the current frame point cloud of the ith scale, wherein the encoded data X _i,X _i is obtained by entropy encoding the occupation symbol of the voxel in the current frame point cloud of the ith scale according to the occupation probability of the voxel in the current frame point cloud of the ith scale;

The method comprises the steps of determining reconstruction geometric data of the current frame point cloud of the ith scale according to P _i, and performing entropy decoding according to the occupation probability of voxels in the current frame point cloud of the ith scale and the coding data X _i to obtain the occupation sign of the voxels in the current frame point cloud of the ith scale.
The method of claim 4, wherein:

The voxel upsampling based on sparse convolution is implemented by a voxel upsampler comprising a transposed sparse convolution layer with a step size of 2 x 2, and one or more residual layers arranged before and after the transposed sparse convolution layer.
The method of claim 4, wherein:

the probability prediction is implemented by a plurality of sparse convolution layers and sigmod functions.
A point cloud encoding method applied to a point cloud encoder, comprising:

Performing N-1 times of voxel downsampling on the geometric data of the current frame point cloud of the first scale to obtain the geometric data of the current frame point cloud of the second scale to the N scale;

And carrying out entropy coding on the geometric data of the current frame point cloud of the N scale.
The method of claim 12, wherein the method further comprises:

Determining coded data of the current frame point cloud of the N-1 th scale to the first scale and performing entropy coding, wherein the coded data X _i of the current frame point cloud of the i-th scale is determined in the following manner, i=n-1, N-2, 1.:

Obtaining implicit characteristic data of the current frame point cloud of the (i+1) -th scale according to the geometrical data of the current frame point cloud of the (i+1) -th scale and the implicit characteristic data of the reference frame point cloud of the (i+1) -th scale by the point cloud inter-frame compensation method according to any one of claims 1 to 3;

performing sparse convolution-based voxel upsampling and probability prediction on implicit characteristic data of the current frame point cloud of the (i+1) -th scale to obtain the occupation probability of voxels in the current frame point cloud of the (i) -th scale;

And entropy coding the occupation symbol of the voxel in the current frame point cloud of the ith scale according to the occupation probability of the voxel in the current frame point cloud of the ith scale, and generating coding data X _i of the current frame point cloud of the ith scale.
The method of claim 13, wherein:

the implicit characteristic data of the i+1th scale reference frame point cloud is obtained by performing voxel downsampling on the geometric data of the first scale reference frame point cloud for i times based on sparse convolution, or

The implicit characteristic data of the reference frame point cloud with the (i+1) th scale is obtained by performing 1-time voxel downsampling on the geometric data of the reference frame point cloud with the (i) th scale based on sparse convolution.
The method of claim 14, wherein:

The voxel downsampling based on sparse convolution is achieved by a downsampler comprising a sparse convolution layer with a step size of 2 x 2, and one or more residual layers arranged before and after the sparse convolution layer.
The method of claim 14, wherein:

The voxel upsampling based on sparse convolution is realized by an upsampler comprising a transposed sparse convolution layer with a step size of 2x 2, and one or more residual layers arranged before and after the transposed sparse convolution layer.
A point cloud code stream, wherein the code stream is obtained according to the point cloud coding method as claimed in any one of claims 12 to 16.
A point cloud inter-frame compensation device comprising a processor and a memory storing a computer program, wherein the processor is capable of implementing the point cloud inter-frame compensation method according to any of claims 1 to 3 when executing the computer program.
A point cloud decoder comprising a processor and a memory storing a computer program, wherein the processor is capable of implementing the point cloud decoding method according to any of claims 4 to 11 when executing the computer program.
A point cloud encoder comprising a processor and a memory storing a computer program, wherein the processor is capable of implementing the point cloud encoding method of any of claims 12 to 16 when executing the computer program.
A point cloud codec system comprising the point cloud encoder of claim 20 and the point cloud decoder of claim 19.
A non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, is capable of implementing the point cloud inter-frame compensation method according to any one of claims 1 to 3, or the point cloud decoding method according to any one of claims 4 to 11, or the point cloud encoding method according to claims 12 to 16.