WO2024060161A1 - Encoding method, decoding method, encoder, decoder and storage medium - Google Patents
Encoding method, decoding method, encoder, decoder and storage medium Download PDFInfo
- Publication number
- WO2024060161A1 WO2024060161A1 PCT/CN2022/120683 CN2022120683W WO2024060161A1 WO 2024060161 A1 WO2024060161 A1 WO 2024060161A1 CN 2022120683 W CN2022120683 W CN 2022120683W WO 2024060161 A1 WO2024060161 A1 WO 2024060161A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- layer
- layer node
- node
- latent variable
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
Definitions
- the embodiments of the present application relate to the field of point cloud compression technology, and in particular, to a coding and decoding method, an encoder, a decoder, and a storage medium.
- OctSqueeze is a common point cloud entropy model used in Laser Radar (LIDAR).
- LIDAR Laser Radar
- OctSqueeze is a LIDAR point cloud octree entropy model based on ancestor nodes
- OctAttention is an expansion of OctSqueeze technology.
- the OctSqueeze scheme assumes that neighbor nodes are conditionally independent when their parent nodes are known. This is usually a wrong assumption and will affect the encoding and decoding efficiency.
- the model needs to be continuously run to update the semantic information of the context window. This process is not parallel and therefore has extremely high time complexity.
- the embodiments of the present application provide a coding and decoding method, an encoder, a decoder and a storage medium, which can improve the efficiency of point cloud coding and decoding while improving the compression performance.
- inventions of the present application provide an encoding method applied to an encoder.
- the encoder includes a first embedding network, an encoding network, and a first decoding network.
- the method includes:
- Input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network, and determine the embedding characteristics of the i-th layer node; wherein i is greater than 2;
- the reconstructed value of the latent variable of the i-th layer node is input to the first decoding network to generate a code stream of the geometric information of the i-th layer node.
- embodiments of the present application provide a decoding method, applied to a decoder, where the decoder includes a second decoding network, and the method includes:
- the reconstructed value of the latent variable of the i-th layer node is input to the second decoding network to determine the geometric information of the i-th layer node.
- embodiments of the present application provide an encoder, which includes a first determining unit, a coding unit, and a generating unit; wherein,
- the first determination unit is configured to input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network to determine the embedding features of the i-th layer node; wherein i is greater than 2; input the embedding features of the i-th layer node into the encoding network to determine the latent variables of the i-th layer node; determine the reconstructed value of the residual of the i-th layer node according to the latent variable of the i-th layer node, and determine the reconstructed value of the latent variable of the i-th layer node according to the reconstructed value of the residual of the i-th layer node;
- the coding unit is configured to write the reconstructed value of the residual of the i-th layer node into the code stream;
- the generating unit is configured to input the reconstructed value of the latent variable of the i-th layer node to the first decoding network and generate a code stream of the geometric information of the i-th layer node.
- embodiments of the present application provide an encoder, which includes a first memory and a first processor; wherein,
- the first memory is used to store a computer program capable of running on the first processor
- the first processor is configured to execute the encoding method as described above when running the computer program.
- embodiments of the present application provide a decoder, which includes a decoding unit and a second determination unit; wherein,
- the decoding unit is configured to decode the code stream
- the second determination unit is configured to determine the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; wherein i is greater than 2; determined based on the reconstruction value of the residual of the i-th layer node The reconstructed value of the latent variable of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node to the second decoding network to determine the geometric information of the i-th layer node.
- embodiments of the present application provide a decoder, the decoder including a second memory and a second processor; wherein,
- the second memory is used to store a computer program capable of running on the second processor
- the second processor is configured to execute the decoding method as described above when running the computer program.
- an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed, it implements the decoding method as described above, or implements the encoding method as described above.
- Embodiments of the present application provide an encoding and decoding method, an encoder, a decoder, and a storage medium.
- the encoder includes a first embedding network, an encoding network, and a first decoding network.
- the encoder converts point clouds into octrees.
- the geometric information of the i-th layer node obtained by the division is input to the first embedding network, and the embedding characteristics of the i-th layer node are determined; where i is greater than 2; the embedding characteristics of the i-th layer node are input to the Encoding network, determine the latent variable of the i-th layer node; determine the reconstruction value of the residual of the i-th layer node according to the latent variable of the i-th layer node, and determine the reconstruction value of the residual of the i-th layer node according to the The reconstruction value determines the reconstruction value of the latent variable of the i-th layer node; the reconstruction value of the latent variable of the i-th layer node is input to the first decoding network to generate a code of the geometric information of the i-th layer node.
- the decoder includes a second decoding network.
- the decoder decodes the code stream and determines the reconstruction value of the residual of the i-th layer node of the point cloud; where i is greater than 2; according to the residual of the i-th layer node
- the reconstruction value determines the reconstruction value of the latent variable of the i-th layer node; the reconstruction value of the latent variable of the i-th layer node is input to the second decoding network to determine the geometric information of the i-th layer node.
- latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
- Figure 1 is a schematic diagram of a point cloud encoding and decoding network architecture provided by an embodiment of the present application
- Figure 2 is a schematic flowchart 1 of the implementation of the encoding method proposed in the embodiment of the present application
- Figure 3 is a schematic diagram of the network structure of the embedded network
- Figure 4 is a schematic diagram of the implementation of the embedded network
- Figure 5 is a schematic diagram of the network structure of the coding network
- Figure 6 is a schematic diagram of the implementation of the encoding network
- Figure 7 is a schematic diagram of the network structure of the subtraction network
- Figure 8 is a schematic diagram of the implementation of the subtraction network
- FIG9 is a schematic diagram of the network structure of an addition network
- Figure 10 is a schematic diagram of the implementation of the addition network
- Figure 11 is a schematic diagram of the network structure of the decoding network
- Figure 12 is a schematic diagram of the implementation of the decoding network
- FIG13 is a second schematic diagram of the implementation flow of the encoding method proposed in the embodiment of the present application.
- Figure 14 is a schematic diagram 2 of the network structure of the decoding network
- Figure 15 is the implementation diagram 2 of the decoding network
- Figure 16 is a schematic diagram of the overall framework for executing the encoding method
- FIG17 is a schematic diagram of a first implementation flow of a decoding method proposed in an embodiment of the present application.
- Figure 18 is a schematic flow chart 2 of the implementation of the decoding method proposed by the embodiment of the present application.
- Figure 19 is a schematic diagram of the overall framework for executing the decoding method
- FIG20 is a schematic diagram of the structure of the encoder
- Figure 21 is a schematic diagram 2 of the structure of the encoder
- Figure 22 is a schematic diagram of the structure of the decoder
- Figure 23 is a schematic diagram 2 of the structure of the decoder.
- first ⁇ second ⁇ third involved in the embodiments of this application are only used to distinguish similar objects and do not represent a specific ordering of objects. It is understandable that “first ⁇ second ⁇ The third "specific order or sequence may be interchanged where permitted, so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.
- Point Cloud is a three-dimensional representation of the surface of an object.
- collection equipment such as photoelectric radar, lidar, laser scanner, and multi-view camera, the point cloud (data) of the surface of the object can be collected.
- Point cloud is a set of discrete points randomly distributed in space that expresses the spatial structure and surface properties of a three-dimensional object or scene.
- Figure 1A shows a three-dimensional point cloud image
- Figure 1B shows a partial enlargement of the three-dimensional point cloud image. It can be seen that the point cloud surface is composed of densely distributed points.
- Two-dimensional images have information expression at each pixel point, and the distribution is regular, so there is no need to record its position information additionally; however, the distribution of points in point clouds in three-dimensional space is random and irregular, so it is necessary to record the position of each point in space in order to fully express a point cloud.
- each position in the acquisition process has corresponding attribute information, usually RGB color values, and the color value reflects the color of the object; for point clouds, in addition to color information, the attribute information corresponding to each point is also commonly the reflectance value, which reflects the surface material of the object. Therefore, the points in the point cloud can include the position information of the point and the attribute information of the point.
- the position information of the point can be the three-dimensional coordinate information (x, y, z) of the point.
- the position information of the point can also be called the geometric information of the point.
- the attribute information of the point can include color information (three-dimensional color information) and/or reflectance (one-dimensional reflectance information r), etc.
- the color information can be information on any color space.
- the color information can be RGB information. Among them, R represents red (Red, R), G represents green (Green, G), and B represents blue (Blue, B).
- the color information may be luminance and chrominance (YCbCr, YUV) information, where Y represents brightness (Luma), Cb (U) represents blue color difference, and Cr (V) represents red color difference.
- the points in the point cloud can include the three-dimensional coordinate information of the point and the reflectivity value of the point.
- the points in the point cloud may include the three-dimensional coordinate information of the point and the three-dimensional color information of the point.
- a point cloud is obtained by combining the principles of laser measurement and photogrammetry. The points in the point cloud may include the three-dimensional coordinate information of the point, the reflectivity value of the point, and the three-dimensional color information of the point.
- Point clouds can be divided into:
- Static point cloud that is, the object is stationary and the device that obtains the point cloud is also stationary;
- Dynamic point cloud The object is moving, but the device that obtains the point cloud is stationary;
- Dynamically acquire point clouds The device that acquires point clouds is in motion.
- point clouds are divided into two categories according to their uses:
- Category 1 Machine perception point cloud, which can be used in scenarios such as autonomous navigation systems, real-time inspection systems, geographic information systems, visual sorting robots, and rescue and disaster relief robots;
- Category 2 Human eye perception point cloud, which can be used in point cloud application scenarios such as digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive communication, and three-dimensional immersive interaction.
- Point clouds can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes, and because point clouds are obtained by directly sampling real objects, they can provide a strong sense of reality while ensuring accuracy, so they are widely used and their scope Including virtual reality games, computer-aided design, geographic information systems, automatic navigation systems, digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive telepresence, three-dimensional reconstruction of biological tissues and organs, etc.
- Point cloud collection mainly has the following methods: computer generation, 3D laser scanning, 3D photogrammetry, etc.
- Computers can generate point clouds of virtual three-dimensional objects and scenes; 3D laser scanning can obtain point clouds of static real-world three-dimensional objects or scenes, and can obtain millions of point clouds per second; 3D photogrammetry can obtain dynamic real-world three-dimensional objects or scenes Point clouds can obtain tens of millions of point clouds per second.
- the number of points in each frame of the point cloud is 700,000, and each point has coordinate information xyz (float) and color information RGB (uchar)
- the data volume of 1280 ⁇ 720 2D video at 24fps for 10s is about 1280 ⁇ 720 ⁇ 12bit ⁇ 24fps ⁇ 10s ⁇ 0.33GB
- point cloud compression has become a key issue to promote the development of the point cloud industry.
- the point cloud is a collection of massive points, storing the point cloud will not only consume a lot of memory, but is also not conducive to transmission. There is not such a large bandwidth to support the direct transmission of the point cloud at the network layer without compression. Therefore, , the point cloud needs to be compressed.
- the point cloud coding framework that can compress point clouds can be the Geometry-based Point Cloud Compression (G-PCC) codec framework provided by the Moving Picture Experts Group (MPEG) Or the Video-based Point Cloud Compression (V-PCC) codec framework, or the AVS-PCC codec framework provided by AVS.
- G-PCC Geometry-based Point Cloud Compression
- MPEG Moving Picture Experts Group
- V-PCC Video-based Point Cloud Compression
- AVS-PCC codec framework provided by AVS.
- the G-PCC encoding and decoding framework can be used to compress the first type of static point cloud and the third type of dynamic point cloud
- the V-PCC encoding and decoding framework can be used to compress the second type of dynamic point cloud.
- the G-PCC encoding and decoding framework is also called point cloud codec TMC13
- the V-PCC encoding and decoding framework is also called point cloud codec TMC2.
- FIG. 1 is a schematic diagram of the network architecture of a point cloud encoding and decoding system provided by an embodiment of the present application.
- the network architecture includes one or more electronic devices 13 to 1N and a communication network 01, wherein the electronic devices 13 to 1N can perform video interaction through the communication network 01.
- electronic devices may be various types of devices with point cloud encoding and decoding functions.
- the electronic devices may include mobile phones, tablet computers, personal computers, personal digital assistants, navigators, digital phones, and video phones.
- televisions, sensing equipment, servers, etc. are not limited by the embodiments of this application.
- the decoder or encoder in the embodiment of the present application can be the above-mentioned electronic device.
- the electronic device in the embodiment of the present application has a point cloud encoding and decoding function, and generally includes a point cloud encoder (ie, encoder) and a point cloud decoder (ie, decoder).
- a point cloud encoder ie, encoder
- a point cloud decoder ie, decoder
- OctSqueeze is a LIDAR point cloud octree entropy model based on ancestor nodes. This technology is applied to the LIDAR point cloud entropy model and mainly consists of the following parts:
- the encoder first performs octree construction on the point cloud. Then, for each node of the octree, its depth, parent node occupancy code, and coordinates are selected as context information, and its corresponding features are obtained through a multi-layer perceptron network. After that, k iterations are performed. For the k-th iteration, the characteristics of the current node and the characteristics of its parent node are spliced, and the characteristics of the current node at the k-th iteration are obtained through the k-th multi-layer perceptron. For the features after the k-th iteration of each node, the probability of each occupancy code is obtained through a 256-dimensional softmax layer, and the current occupancy code is losslessly encoded using arithmetic coding and the probability of the current occupancy code.
- the code stream of OctSqueeze technology consists of the code stream corresponding to the occupied code of each layer of nodes.
- the decoder decodes the octree sequentially from lower layers to higher layers in units of layers.
- the decoding process is exactly the same as the encoding process except for the node order.
- the loss function is the cross entropy of the predicted node probability and the real occupancy code, as follows:
- N is the number of octree nodes
- xi represents the i-th node
- j is the dimension in 256 dimensions
- p ij is the probability of the corresponding occupancy code
- OctAttention is a LIDAR point cloud octree entropy model based on ancestor sibling nodes. This technology is applied to the LIDAR point cloud entropy model and is an expansion of the OctSqueeze technology. It mainly consists of the following parts:
- This technique encodes and decodes point clouds in breadth-first order, using a context window to store encoded/decoded nodes.
- an attention-based network is used to extract the information of the context window, and the softmax function is used to obtain the probability of each occupied code of the current node. Losslessly encode the current node using arithmetic coding and add the current node to the context window.
- the code stream of this technology consists of the code streams of each node arranged in breadth-first order.
- the decoding process is the same as the encoding process.
- the loss function is the cross entropy of the predicted node probability and the real occupancy code, as follows:
- N is the number of octree nodes
- xi represents the i-th node
- j is the dimension in 256 dimensions
- pij is the probability of the corresponding occupied code.
- embodiments of the present application provide a coding and decoding method, an encoder, a decoder, and a storage medium.
- the encoder includes a first embedding network, a coding network, and a first decoding network.
- the encoder will The geometric information of the i-th layer node obtained by Yuanyi octree division is input to the first embedding network to determine the embedding characteristics of the i-th layer node; where i is greater than 2; the embedding characteristics of the i-th layer node are input to the encoding network, Determine the latent variable of the i-th layer node; determine the reconstructed value of the residual of the i-th layer node based on the latent variable of the i-th layer node, and determine the reconstructed value of the residual of the i-th layer node based on the reconstructed value of the residual of the i-th layer node.
- the decoder includes a second decoding network.
- the decoder decodes the code stream and determines the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; where i is greater than 2; according to the i-th layer node
- the reconstructed value of the residual determines the reconstructed value of the latent variable of the i-th layer node; the reconstructed value of the latent variable of the i-th layer node is input to the second decoding network to determine the geometric information of the i-th layer node.
- latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
- the embodiment of the present application proposes a point cloud encoding method, which can be applied to an encoder, where the encoder can include a first embedding network, an encoding network, and a first decoding network.
- the encoder may be provided with a multi-layer network, where at least one layer of the multi-layer network may include a first embedding network, an encoding network, and a first decoding network.
- the encoder may be provided with a 5-layer network, where one layer of the network or a multi-layer network in the 5-layer network may include a first embedding network, a coding network and a first Decoding network.
- Figure 2 is a schematic flow chart of the implementation of the encoding method proposed by the embodiment of the present application. As shown in Figure 2, the method for the encoder to perform encoding processing may include the following steps:
- Step 101 Input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network to determine the embedding characteristics of the i-th layer node; where i is greater than 2.
- the encoder can first input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network, so that the embedding characteristics of the i-th layer node can be determined.
- i may be an integer greater than 2.
- the i-th layer network in the multi-layer network can perform parallel encoding processing on the geometric information of the i-th layer node.
- an octree when performing geometric encoding processing on the point cloud, an octree may be constructed first.
- the octree structure is used to recursively divide the point cloud space into eight sub-blocks of the same size, and the occupied code words of each sub-block are judged.
- the sub-block does not contain points, it is recorded as empty, otherwise it is recorded as empty.
- the occupied codeword information of all blocks is recorded in the last layer of recursive division, and geometric encoding is performed; on the one hand, the geometric information expressed through the octree structure can further form a geometric code stream, on the other hand, it can be geometrically encoded During the reconstruction process, the reconstructed geometric information is used as additional information for attribute encoding.
- the parent node layer can be defined as a high-level layer and the child node layer as a low-level layer; the parent node layer can also be defined as a low-level layer and the child node layer can be defined as a low-level layer.
- a layer is defined as a high-level layer.
- the encoding sequence in which the encoder performs the encoding process may be in the order from the root node to the leaf node of the constructed octree.
- the decoding order may be in the order from the leaf nodes to the root node of the constructed octree.
- the root node layer of the constructed octree is the i-th layer
- the leaf node layers are from the i-1 to the first layer in sequence.
- the order of encoding processing is from the i-th layer to the first layer
- the order of decoding processing is The opposite of encoding, that is, from the first layer to the i-th layer.
- the geometric information of the i-th layer nodes of the point cloud can represent the geometric information of the i-th layer nodes, such as the position coordinates of the nodes, and can also represent the occupancy codeword situation of the i-th layer nodes, that is, the occupancy situation (placeholder information).
- the geometric information of the i-th layer node includes both the position coordinate information of the node and the occupancy information of the node.
- the input of the first embedding network is the geometric information of the i-th layer node, wherein the geometric information of the node as the input of the first embedding network may also include nodes
- the position coordinate information and the placeholder information of the node; and the code stream of the geometric information output by the first decoding network mainly includes the code stream of the placeholder information of the node, which may include the code stream of the node's position coordinate information, or may not include Code stream of node position coordinate information.
- the embedding network and the encoding network need to refer to the place information and position coordinate information of the node at the same time, while the decoding network focuses on generating a code stream of the place information.
- the first embedding network may be an occupied embedding network embedding.
- the first embedding network can be used to map the octree nodes from the 256-dimensional one-hot vector to the 16-dimensional continuous feature space, so that similar occupancy conditions in the space are also similar in the feature space.
- the first embedding network may include a sparse convolutional network.
- the network structure of the first embedding network may be composed of a multi-layer sparse convolutional network.
- Figure 3 is a schematic diagram of the network structure of the embedding network. As shown in Figure 3, the network structure of the first embedding network can be a three-layer sparse convolutional network.
- the encoder after the encoder inputs the geometric information of the i-th layer node in the point cloud to the first embedding network, the encoder can output the embedding feature of the i-th layer node.
- the embedded features of the i-th layer node can determine the occupancy status of the i-th layer node after it is mapped to the feature space.
- the embedded characteristics of the i-th layer node can determine the occupancy status of the i-th layer node.
- Figure 4 is a schematic diagram of the implementation of the embedding network.
- x (l) represents the geometric information of the l-th layer node
- e (l) represents the first embedding network output.
- the embedded features of the l-th layer node For the l-th layer node, the encoder can input the geometric information x (l) of the l-th layer node into the first embedding network, so that the corresponding l-th layer node can be obtained through the first embedding network.
- Embedding features e (l) of layer nodes are the first embedding network.
- Step 102 Input the embedded features of the i-th layer node into the encoding network to determine the latent variables of the i-th layer node.
- the encoder after the encoder inputs the geometric information of the i-th layer node in the point cloud to the first embedding network and outputs the embedding feature of the i-th layer node, it can further add the embedding feature of the i-th layer node Input to the encoding network to determine the latent variables of the i-th layer node.
- the encoding network may be an encoder network.
- the encoding network can be used to determine the spatial correlation between the latent variables of the previous layer node of the current layer and the embedded features of the current layer node, thereby obtaining the latent variables of the current layer node. That is, through the encoding network, the latent variable of the i-th layer node can be determined based on the embedding characteristics of the i-th layer node output by the first embedding network.
- the latent variable of the i-th layer node can represent the correlation between the i-th layer sibling nodes. That is to say, in this application, the encoder can make full use of hierarchical latent variables to capture the correlation of LIDAR point clouds.
- the encoding network can be implemented through sparse convolution, which may specifically include sparse convolution network, ReLU activation function, initial residual network (Inception Resnet, IRN), and feature fusion network Concatenate .
- the network structure of the encoding network may be composed of a sparse convolution network, a ReLU activation function, an initial residual network, and a Concatenate network.
- Figure 5 is a schematic diagram of the network structure of the encoding network.
- the network structure of the encoding network can be a sparse convolution layer with a convolution kernel size of 2 and a stride of 2, followed by a ReLU activation function, and then Three layers of IRN are connected, and after passing through the Concatenate network, a layer of sparse convolutional network is finally connected.
- the encoder can continue to input the embedding features of the i-th layer nodes into the encoding network, and then the latent variables of the i-th layer nodes can be output through the encoding network.
- the coding network uses the embedded features of the current layer node to determine the latent variables of the current layer node, it needs to be combined with the latent variables of the previous layer node.
- the previous layer of the current layer can be understood as the parent node layer of the current node.
- the latent variable of the node in the previous layer can be input to the current layer.
- the coding network of the current layer can determine the spatial correlation between the latent variables of the previous layer nodes and the embedded features of the current layer nodes, thereby obtaining the latent variables of the current layer nodes.
- the input of the coding network of the i-th layer network may include the latent variable of the i+1-th layer node output by the coding network of the i+1-th layer network.
- the coding network of the i-th layer can also input the latent variables of the i-th layer node to the coding network of the i-1th layer network, That is, the input of the coding network of the i-1th layer network may include the latent variables of the i-th layer node output by the coding network of the i-th layer network.
- the encoder can input the embedded features of the i-th layer node and the latent variables of the i+1-th layer node into the encoding network of the i-th layer network, so that the i-th layer node can be determined latent variables.
- the embedded features of the i-th layer node are output by the first embedding unit of the i-th layer network
- the latent variables of the i+1-th layer node are output by the encoding network of the i+1-th layer network.
- Figure 6 is a schematic diagram of the implementation of the coding network.
- e (l) represents the embedding feature of the l-th layer node output by the first embedding network
- f (l) Represents the latent variable of the l-th layer node
- f (l+1) represents the latent variable of the l+1-th layer node
- the encoder can explore the spatial correlation between f (l+1) and e (l) property, and output the latent variable f (l) of the lower layer (layer l).
- a sparse tensor can be used to represent an octree, where the coordinate matrix of the sparse tensor represents the node coordinates of the octree; the attribute matrix of the sparse tensor represents the node occupancy code of the octree.
- the encoding network can be implemented using sparse convolution, where the downsampling of f (l) is implemented by a sparse convolution layer with a convolution kernel size of 2 and a stride of 2, followed by a ReLU activation function. After that, the initial residual network is used to aggregate the sibling node features and output to the Concatenate network for feature fusion with e (l) .
- a sparse convolution layer is used to fuse e (l) with the downsampled f (l+ 1) , obtain the latent variable of the low-level node, that is, the latent variable f (l) of the l+1th level node.
- Step 103 Determine the reconstructed value of the residual of the i-th node based on the latent variable of the i-th node, and determine the reconstructed value of the latent variable of the i-th node based on the reconstructed value of the residual of the i-th node.
- the encoder after the encoder inputs the embedded features of the i-th layer node into the encoding network and determines the latent variables of the i-th layer node, it can further determine the i-th layer node's latent variables based on the latent variables of the i-th layer node.
- the reconstructed value of the residual can then determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node.
- the reconstructed value of the residual of the i-th layer node can be written into the code stream.
- the encoder when the encoder determines the reconstructed value of the residual of the i-th layer node based on the latent variable of the i-th layer node, the encoder may first determine the reconstructed value of the residual of the i-th layer node based on the predicted value of the latent variable of the i-th layer node. and the latent variables of the i-th layer node to determine the residual of the i-th layer node; then the residual of the i-th layer node can be quantified, and then the reconstructed value of the residual of the i-th layer node can be determined.
- the encoder can write the reconstructed value of the residual of the i-th layer node into the code stream and transmit it to the decoder.
- the encoder can decompose and compress the latent variables using residual coding.
- the predicted value of the latent variable of the i-th layer node can be obtained first.
- the predicted value of the latent variable of the i-th layer node can be predicted by the first decoding network of the next layer network of the current layer, that is, it can be the i-th layer output by the first decoding network in the i-1 layer network.
- the predicted value of the latent variable of the layer node can be obtained first.
- the encoder may also include a subtraction network, where the subtraction network may be used to determine the residual of the latent variable.
- At least one layer of the multi-layer network set by the encoder may also include a subtraction network.
- the subtraction network is a soft subtraction network, which can be used to replace common hard subtraction processing.
- the residual can be determined through the following formula:
- gs represents the operation of a sparse convolutional layer after concatenation
- r (l) represents the residual of the l-th layer node
- f (l) represents the latent variable of the l-th layer node. Represents the predicted value of the latent variable of the l-th layer node.
- the encoder determines the residual of the i-th layer node
- the predicted value of the latent variable of the i-th layer node and the latent variable of the i-th layer node can be input to the subtraction network, Then the residual of the i-th layer node can be obtained.
- the subtraction network may include a sparse convolution network and a feature fusion network Concatenate.
- Figure 7 is a schematic diagram of the network structure of the subtraction network.
- the network structure of the subtraction network can be a sparse convolution with a convolution kernel size of 3 and a step size of 1. Accumulated layer, followed by Concatenate network.
- Figure 8 is a schematic diagram of the implementation of the subtraction network.
- f (l) represents the latent variable of the l-th layer node
- r (l) represents the residual of the l-th layer node.
- f (l) and After the fusion of the Concatenate network, the residual r (l) of the l-th layer node can be finally output through the operation of the sparse convolution layer.
- the encoder after the encoder performs quantization processing on the obtained residual of the i-th layer node and determines the reconstruction value of the residual of the i-th layer node, it can further calculate the residual value of the i-th layer node based on the residual value of the i-th layer node.
- the difference between the reconstructed value and the predicted value of the latent variable of the i-th layer node determines the reconstructed value of the latent variable of the i-th layer node.
- the predicted value of the latent variable of the i-th layer node may be output by the first decoding network in the i-1th layer.
- the encoder may also include an additive network, where the additive network may be used to determine the reconstructed value of the latent variable.
- At least one layer of the multi-layer network set by the encoder may also include an additive network.
- the addition network is a soft addition network, which can be used to replace common hard addition processing.
- the determination of the predicted value of the latent variable can be achieved through the following formula:
- g a represents the operation of a sparse convolution layer after splicing, Represents the reconstructed value of the residual of the l-th layer node, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the predicted value of the latent variable of the l-th layer node.
- the encoder when the encoder determines the reconstructed value of the latent variable of the i-th layer node, the encoder may first combine the reconstructed value of the residual of the i-th layer node with the reconstructed value of the latent variable of the i-th layer node.
- the predicted value is input to the additive network, and the reconstructed value of the latent variable of the i-th layer node can be obtained.
- the additive network may include a sparse convolutional network and a feature fusion network Concatenate.
- Figure 9 is a schematic diagram of the network structure of the additive network.
- the network structure of the additive network can be a sparse convolution with a convolution kernel size of 3 and a step size of 1. Accumulated layer, followed by Concatenate network.
- Figure 10 is a schematic diagram of the implementation of the addition network, as shown in Figure 10, Represents the reconstructed value of the residual of the l-th layer node, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the predicted value of the latent variable of the l-th layer node. in, and After the fusion of the Concatenate network, the reconstructed value of the latent variable of the l-th layer node can be finally output through the operation of the sparse convolution layer.
- soft addition/subtraction networks are used instead of the original hard addition and subtraction, which greatly improves the network flexibility.
- Step 104 Input the reconstructed value of the latent variable of the i-th layer node to the first decoding network to generate a code stream of the geometric information of the i-th layer node.
- the encoder may further input the reconstructed value of the latent variable of the i-th layer node to the first decoder. network, so that the code stream of the geometric information of the i-th layer node can be generated.
- the first embedding network in the i-th layer network to the first layer network has been completed.
- the encoding network in the i-th layer network to the first layer network has completed the output of the corresponding latent variables.
- the i-1th layer network to the first layer network has completed the corresponding latent variable output. The output of the reconstructed value.
- the input of the first decoding network of the i-th layer network may include: the embedded features of the i-2th layer node output by the first embedding network of the i-2th layer network, the i-th layer node The embedding feature of the i-1th layer node output by the first embedding network of the -1 layer network, the reconstructed value of the latent variable of the i-1th layer node obtained by the i-1th layer network; the first decoding of the i-th layer network
- the input of the network can also include the reconstructed value of the latent variable of the i-th layer node.
- the encoder when the encoder generates a code stream of geometric information of the i-th layer nodes according to the reconstructed values of the latent variables of the i-th layer nodes, the reconstructed values of the latent variables of the i-th layer nodes, the embedded features of the i-2 layer nodes, the embedded features of the i-1 layer nodes, and the reconstructed values of the latent variables of the i-1 layer nodes can be first input into the first decoding network of the i-th layer network, thereby generating a code stream of geometric information of the i-th layer nodes.
- the probability parameters of the i-th layer node can be determined first; and then the geometric information of the i-th layer node can be further generated based on the probability parameters of the i-th layer. code stream.
- the embedding features of the i-2th layer node, the embedding features of the i-1th layer node, and the reconstructed values of the latent variables of the i-1th layer node can be obtained first.
- the embedded features of the i-2th layer node can be obtained from the first embedding network of the two lower layers of the current layer, that is, the i-2th layer node can be output by the first embedding network in the i-2th layer network.
- the embedded features of the layer node; the embedded features of the i-1th layer node can be obtained by the first embedding network of the next layer network of the current layer, that is, it can be output by the first embedding network in the i-1th layer network.
- the embedded features of the i-1th layer node; the reconstructed value of the latent variable of the i-1th layer node can be obtained from the network of the next layer of the current layer.
- the first decoding network may include a sparse convolution network, a deconvolution layer, an initial residual network, a feature fusion network, and a ReLU activation function.
- Figure 11 is a schematic diagram of the network structure of the decoding network.
- the network structure of the first decoding network can be a Concatenate network followed by two layers of convolution kernels with a size of 3.
- Figure 12 is a schematic diagram 1 of the implementation of the decoding network.
- e (l-1) represents the embedding feature of the l-1th layer node output by the first embedding network
- e (l-2) represents the embedding feature output by the first embedding network.
- Figure 13 is a schematic diagram 2 of the implementation flow of the encoding method proposed in the embodiment of the present application.
- the latent variable of the i-th layer node is determined according to the latent variable of the i-th layer node.
- the encoder's encoding method may also include the following steps:
- Step 105 Input the reconstructed value of the latent variable of the i-th layer node to the first decoding network, and determine the predicted value of the latent variable of the i+1-th layer node.
- the encoder may further input the reconstructed value of the latent variable of the i-th layer node to the first decoder. network, so that the predicted value of the latent variable of the i+1th layer node can be determined.
- the first embedding network in the i-th layer network to the first layer network has already The output of the corresponding embedded features has been completed.
- the coding network from the i-th layer network to the first layer network has completed the output of the corresponding latent variables.
- the i-1 layer network to the first layer network has completed the corresponding output. Output of the reconstructed values of the latent variable.
- the input of the first decoding network of the i-th layer network may include: the embedded features of the i-1th layer node output by the first embedding network of the i-1th layer network, the i-th layer node The reconstructed value of the latent variable of the i-1th layer node obtained by the -1 layer network; the input of the first decoding network of the i-th layer network may also include the reconstructed value of the latent variable of the i-th layer node and the i-th embedded feature.
- the encoder when the encoder determines the predicted value of the latent variable of the i+1-th layer node based on the reconstructed value of the latent variable of the i-th layer node, the encoder may convert the latent variable of the i-th layer node into The reconstructed value of the variable, the embedded feature of the i-th layer node, the embedded feature of the i-1th layer node, and the reconstructed value of the latent variable of the i-1th layer node are input to the first decoding network of the i-th layer network, so that it can be determined The predicted value of the latent variable of the i+1th layer node.
- the embedding features of the i-1th layer node and the reconstructed values of the latent variables of the i-1th layer node can be obtained first.
- the embedded features of the i-1th layer node can be obtained by the first embedding network of the next layer network of the current layer, that is, the i-1th layer node can be output by the first embedding network in the i-1th layer network.
- Embedding features of layer nodes; the reconstructed value of the latent variable of the i-1th layer node can be obtained from the network of the next layer of the current layer.
- the first decoding network may include a sparse convolution network, a deconvolution layer, an initial residual network, a feature fusion network, and a ReLU activation function.
- Figure 14 is a schematic diagram 2 of the network structure of the decoding network.
- the network structure of the first decoding network can be a Concatenate network followed by a layer of convolution kernel size. 2.
- a sparse convolution layer with a stride of 2 followed by three initial residual networks, followed by a sparse convolution layer with a convolution kernel size of 3 and a stride of 1.
- Figure 15 is a schematic diagram 2 of the implementation of the decoding network.
- e (l) represents the embedding feature of the l-th layer node output by the first embedding network
- e (l-1) represents the l-1th layer output by the first embedding network Embedding features of layer nodes.
- two different branches can be output based on different inputs.
- One branch is the code corresponding to the geometric information of the i-th layer node of the octree. flow, and the other branch is the predicted value of the latent variable of the i+1th layer node of the octree.
- the code stream corresponding to the geometric information of the i-th layer node of the octree can be transmitted to the decoding end, and the predicted value of the latent variable of the i+1-th layer node of the octree is It can be input into the i+1th layer network and used to determine the residual of the i+1th layer node and the reconstruction value of the latent variable of the i+1th layer node.
- the encoder can first perform quantization processing on the latent variables of the first layer nodes output by the coding unit of the first layer network, and determine the latent variables of the first layer nodes.
- the reconstructed value; then the reconstructed value of the latent variable and the preset embedding feature of the first layer node can be output to the first decoding unit of the first layer network, and then the code stream of the geometric information of the first layer node and the second The predicted value of the latent variable of the layer node.
- the first level may be a level in an octree in which all leaf nodes are empty.
- the last level of the octree that cannot be further divided is the first level.
- the first level may also be a level in an octree that is divided into minimum units, such as divided into minimum unit blocks of 1x1x1.
- the last level of the octree divided into the smallest unit blocks is the first level.
- the encoder may use a factorized variational auto-encoder style entropy encoder to compress the residual r (l) .
- p(x) is defined as the probability distribution of x, and the probability assumption The independence of each dimension has the following formula:
- the cumulative distribution c(y) fitted by the neural network is based on c(y) should have the following properties:
- the final output code stream may be composed of the code stream corresponding to the output of at least one layer of the multi-layer network.
- the output code stream includes the l-th layer node generated by the factorial variation autoencoder.
- the code stream is the code stream of the geometric information x (l) of the l-th layer node generated by the arithmetic encoder, and the structural information.
- the encoding method proposed in steps 101 to 105 above can rely on the octree structure, use the constructed embedding network, encoding network and decoding network, and introduce latent variables to capture brothers in the same layer in the octree
- the correlation between nodes can achieve higher compression performance; at the same time, parallel encoding and decoding processing based on the level of the octree can effectively shorten the encoding and decoding time, greatly improve the encoding and decoding efficiency, and further improve the compression performance.
- the embodiment of the present application provides an encoding method.
- the encoder includes a first embedding network, a coding network, and a first decoding network.
- the encoder divides the point cloud according to the octree to obtain the geometric information of the i-th layer node.
- the latent variable determines the reconstructed value of the residual of the i-th layer node, and determines the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node
- a code stream of geometric information of the i-th layer node is generated.
- latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
- the encoding method proposed in the embodiment of this application can be understood as an end-to-end LIDAR point cloud depth entropy model, or as an end-to-end self-supervised dynamic point cloud compression technology based on deep learning.
- it relies on the octree structure, introduces latent variables to capture the correlation of octree sibling nodes, and uses sparse convolution to construct the network, thus achieving the optimal effect in the LIDAR point cloud lossless entropy model.
- the encoding method proposed in the embodiment of this application can improve the encoding and decoding efficiency of LIDAR point cloud.
- the embodiments of the present application have a lower encoding and decoding time. This is because the encoding and decoding of each node in the embodiments of the present application are processed in parallel on a layer basis, and efficient sparse convolution is used to construct the network. Therefore, it has Much lower encoding and decoding time compared to common encoding and decoding schemes.
- the network that performs the coding method proposed in the embodiment of the present application can make full use of hierarchical latent variables to capture the LIDAR point cloud correlation, and can also use residual coding to decompose and compress the latent variables. Furthermore, it can also The soft addition/subtraction network is used to replace the original hard addition/subtraction, which improves the flexibility of the network.
- Figure 16 is a schematic diagram of the overall framework for executing the encoding method.
- the encoder can be provided with a multi-layer network, wherein at least one layer of the multi-layer network can include a third layer.
- each layer consists of an encoding network (encoder), a decoding network (decoder), and an embedding network (embedding).
- the l-th layer network in the multi-layer network can perform parallel encoding processing on the geometric information of the l-th layer nodes.
- x (l) represents the geometric information of the l-th layer node
- e (l) represents the embedded feature of the l-th layer node after occupying the embedded network
- f (l) represents the latent variable of the l-th layer node.
- the +/- symbols in the figure represent soft addition/subtraction networks (addition networks/subtraction networks), and Q represents quantization operations.
- the network provided in the encoder may be composed of a 5-layer network.
- the geometric information x (l) of the l-th layer node passes through the embedding network to obtain the occupancy embedding e (l) (embedded feature).
- the encoding network captures the spatial correlation between e (l) and the latent variable f (l+1) of the previous layer (layer l+1), and outputs the latent variable f (l) of the current layer node.
- the decoding network is divided into two paths, which generate the code stream of the current layer x (l) of the octree and the predicted value of the latent variable of the node in the previous layer. Among them, the predicted value of the latent variable of the l-th layer node The residual of the l-th layer node between the real value f (l) is r (l) . After quantization, it is the reconstructed value of the residual of the l-th layer node. Lossless entropy coding by factorial variational autoencoders.
- the encoder may first perform The latent variable f (l-5) is quantified to determine the reconstructed value of the latent variable of the first layer node. Then the reconstructed value of the latent variable of the first layer node and the preset embedding feature e (l-6) can be output to the first decoding unit of the first layer network, and then the code stream of the geometric information of the first layer node can be output respectively. and the predicted value of the latent variable of the second-level node
- the input of the encoding network of the l-1th layer network may include the latent variables of the l-th layer nodes output by the encoding network of the l-th layer network. That is to say, in the embodiment of the present application, the encoder can input the embedded features of the l-1th layer node and the latent variable of the l-th layer node into the encoding network of the l-1th layer network, so that the l-th layer node can be determined -Latent variables of layer 1 nodes.
- the embedded features of the l-1th layer node are output by the first embedding unit of the l-1th layer network, and the latent variables of the l-th layer node are output by the encoding network of the l-th layer network.
- the input of the first decoding network of the l-1th layer network may include: the l-3th layer output of the first embedding network of the l-3th layer network
- the input of the first decoding network of the l-1 layer network may also include the reconstructed value of the latent variable of the l-1 layer node.
- the encoder when the encoder generates the code stream of the geometric information of the l-1th layer node, it can first combine the reconstructed value of the latent variable of the l-1th layer node, the embedded feature of the l-3th layer node, the The embedded features of the l-2 layer nodes and the reconstructed values of the latent variables of the l-2 layer nodes are input to the first decoding network of the l-1 layer network, thereby generating a code stream of the geometric information of the l-1 layer nodes. .
- the encoder determines the predicted value of the latent variable of the l-th layer node, the reconstructed value of the latent variable of the l-1th layer node, the l-1th layer node
- the embedded features of the layer node, the embedded features of the l-2th layer node, and the reconstructed value of the latent variable of the l-2th layer node are input to the first decoding network of the l-1th layer network, so that the l-th layer node can be determined. Predictive value of the latent variable.
- the subsequent encoding of the l-th layer can be During the processing, the predicted value of the latent variable of the l-th layer node is used, and the addition network and the subtraction network are used to determine the residual of the l-th layer node and the reconstruction value of the latent variable of the l-th layer node, respectively.
- the encoding network can be used to explore the spatial correlation between e (l) and the latent variable f (l+1 ) of the previous layer (l+1th layer), And output the latent variable f (l) of the current layer node.
- a sparse tensor can be used to represent an octree, where the coordinate matrix of the sparse tensor represents the node coordinates of the octree; the attribute matrix of the sparse tensor represents the node occupancy code of the octree.
- the encoding network can be implemented using sparse convolution, where the downsampling of f (l) is implemented by a sparse convolution layer with a convolution kernel size of 2 and a stride of 2, followed by a ReLU activation function. After that, the initial residual network is used to aggregate the sibling node features and output to the Concatenate network for feature fusion with e (l) . Finally, a sparse convolution layer is used to fuse e (l) with the downsampled f (l+ 1) , obtain the latent variable of the low-level node, that is, the latent variable f (l) of the l+1th level node.
- the decoder network (decoder) is used to generate the code stream of the current layer x (l) of the octree and the predicted value of the latent variable of the node in the previous layer.
- one branch includes: after obtaining e (l-1) , e (l-2) , you can use the Concatenate network to After splicing e (l-1) and e (l-2) , the probability p (l) of each node in the l-th layer is predicted through a two-layer sparse convolution network, and finally generated by entropy coding through an arithmetic encoder The binary code stream corresponding to x (l) .
- Another branch includes: after obtaining e (l) , e (l-1) , you can use the Concatenate network to After e (l-1) and e (l-2) are spliced, a deconvolution layer with a convolution kernel size of 2 and a step size of 2 is passed, and then the l+1th layer node is obtained through the initial residual network
- e (l) you can use the Concatenate network to After e (l-1) and e (l-2) are spliced, a deconvolution layer with a convolution kernel size of 2 and a step size of 2 is passed, and then the l+1th layer node is obtained through the initial residual network
- e (l) you can use the Concatenate network to After e (l-1) and e (l-2) are spliced, a deconvolution layer with a convolution kernel size of 2 and a step size of 2 is passed, and then the l+1th layer node is obtained through the initial residual
- the embedding network can be used to map the octree nodes from the 256-dimensional one-hot vector to the 16-dimensional continuous feature space, so that they occupy similar spaces in the space. The situation is similar in feature space.
- At least one layer of the multi-layer network set by the encoder may also include a subtraction network and an addition network.
- a subtraction network and an addition network.
- the encoding method proposed in the embodiment of this application can rely on the octree structure, use the built embedding network, encoding network and decoding network, and introduce latent variables to capture the differences between sibling nodes in the same layer in the octree.
- the correlation can achieve higher compression performance; at the same time, parallel encoding and decoding processing based on the level of the octree can effectively shorten the encoding and decoding time, greatly improve the encoding and decoding efficiency, and further improve the compression performance.
- the embodiment of the present application provides an encoding method.
- the encoder includes a first embedding network, a coding network, and a first decoding network.
- the encoder divides the point cloud according to the octree to obtain the geometric information of the i-th layer node.
- the latent variable determines the reconstructed value of the residual of the i-th layer node, and determines the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node
- a code stream of geometric information of the i-th layer node is generated.
- latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
- the embodiment of the present application proposes a point cloud decoding method, which can be applied to a decoder, where the decoder can include a second embedding network and a second decoding network.
- the decoder may be provided with a multi-layer network, wherein at least one layer of the multi-layer network may include a second embedding network and a second decoding network.
- the decoder may be provided with a 5-layer network, wherein one layer of the 5-layer network or a multi-layer network may include a second embedding network and a second decoding network.
- FIG 17 is a schematic flowchart 1 of the implementation of the decoding method proposed by the embodiment of the present application.
- the method for the decoder to perform decoding processing may include the following steps:
- Step 201 Decode the code stream and determine the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; where i is greater than 2.
- the decoder can determine the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree by decoding the code stream.
- i may be an integer greater than 2.
- the i-th layer network in the multi-layer network can perform parallel decoding processing on the geometric information of the i-th layer nodes.
- an octree when performing geometric encoding processing on the point cloud, an octree may be constructed first.
- the octree structure is used to recursively divide the point cloud space into eight sub-blocks of the same size, and the occupied code words of each sub-block are judged.
- the sub-block does not contain points, it is recorded as empty, otherwise it is recorded as empty.
- the occupied codeword information of all blocks is recorded in the last layer of recursive division, and geometric encoding is performed; on the one hand, the geometric information expressed through the octree structure can further form a geometric code stream, on the other hand, it can be geometrically encoded During the reconstruction process, the reconstructed geometric information is used as additional information for attribute encoding.
- the reconstruction value of the residual of the i-th layer node of the point cloud can be determined.
- the parent node layer can be defined as a high-level layer and the child node layer as a low-level layer; the parent node layer can also be defined as a low-level layer and the child node layer can be defined as a low-level layer.
- a layer is defined as a high-level layer.
- the encoding sequence in which the encoder performs the encoding process may be in the order from the root node to the leaf node of the constructed octree.
- the decoding order may be in the order from the leaf nodes to the root node of the constructed octree.
- the root node layer of the constructed octree is the i-th layer
- the leaf node layers are from the i-1 to the first layer in sequence.
- the order of encoding processing is from the i-th layer to the first layer
- the order of decoding processing is The opposite of encoding, that is, from the first layer to the i-th layer.
- the residual of the i-th layer node can be determined first based on the predicted value of the latent variable of the i-th layer node and the latent variable of the i-th layer node; and then The residual of the i-th layer node is quantized, and then the reconstructed value of the residual of the i-th layer node can be determined.
- the encoder can write the reconstructed value of the residual of the i-th layer node into the code stream and transmit it to the decoder.
- the encoder can decompose and compress the latent variables using residual coding.
- Step 202 Determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node.
- the decoder after the decoder determines the reconstructed value of the residual of the i-th layer node after decoding the code stream, it can determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node. .
- the predicted value of the latent variable of the i-th layer node can be obtained first.
- the predicted value of the latent variable of the i-th layer node can be predicted by the second decoding network of the next layer network of the current layer, that is, it can be the output of the i-th layer by the second decoding network in the i-1 layer network.
- the predicted value of the latent variable of the layer node can be obtained first.
- the decoder can further calculate the reconstruction value of the residual of the i-th layer node and the prediction of the latent variable of the i-th layer node. value, determines the reconstructed value of the latent variable of the i-th layer node.
- the predicted value of the latent variable of the i-th layer node may be output by the second decoding network in the i-1th layer.
- the decoder may also include an addition network, where the addition network may be used to determine the reconstructed value of the latent variable.
- At least one layer of the multi-layer network set up by the decoder may also include an additive network.
- the addition network is a soft addition network, which can be used to replace common hard addition processing.
- the determination of the predicted value of the latent variable can be achieved through the above formula (2).
- the decoder when the decoder determines the reconstructed value of the latent variable of the i-th layer node, the decoder may first combine the reconstructed value of the residual of the i-th layer node with the reconstructed value of the latent variable of the i-th layer node. The predicted value is input to the additive network, and the reconstructed value of the latent variable of the i-th layer node can be obtained.
- the additive network may include a sparse convolutional network and a feature fusion network Concatenate.
- the network structure of the additive network can be a sparse convolution layer with a convolution kernel size of 3 and a step size of 1, followed by a Concatenate network.
- a soft addition network (addition network) is used to replace the original hard addition, which greatly improves the network flexibility.
- the latent variable of the i-th layer node can represent the correlation between the i-th layer sibling nodes. That is to say, in this application, the decoder can make full use of hierarchical latent variables to capture the correlation of LIDAR point clouds.
- Step 203 Input the reconstructed value of the latent variable of the i-th layer node to the second decoding network to determine the geometric information of the i-th layer node.
- the decoder may input the reconstructed value of the latent variable of the i-th layer node to the i-th layer node. 2 decoding network, so that the geometric information of the i-th layer node can be determined.
- the geometric information of the i-th layer node of the point cloud can represent the geometric information of the i-th layer node, such as the position coordinate of the node, and can also represent the occupied codeword of the i-th layer node. situation, that is, occupancy status (occupancy information).
- the geometric information of the i-th layer node includes both the position coordinate information of the node and the occupancy information of the node.
- the input of the second embedding network is the geometric information of the i-th layer node, wherein the geometric information of the node as the input of the second embedding network may also include nodes
- the position coordinate information and the occupancy information of the node; and the geometric information output by the second decoding network mainly includes the occupancy information of the node, which may or may not include the position coordinate information of the node.
- the embedding network needs to refer to the occupancy information and position coordinate information of the node at the same time, while the decoding network focuses on generating the occupancy information.
- the second embedding network in the i-1th layer network to the first layer network has completed the corresponding The output of the embedded feature, at the same time, the i-1th layer network to the first layer network have completed the output of the corresponding reconstruction value of the latent variable.
- the input of the second decoding network of the i-th layer network may include: the embedded features of the i-2-th layer node output by the second embedding network of the i-2-th layer network, the i-th layer node The embedding feature of the i-1th layer node output by the second embedding network of the -1 layer network, the reconstructed value of the latent variable of the i-1th layer node obtained by the i-1th layer network; the second decoding of the i-th layer network
- the input of the network can also include the reconstructed value of the latent variable of the i-th layer node.
- the decoder when the decoder generates the geometric information of the i-th layer node based on the reconstructed value of the latent variable of the i-th layer node, it may first convert the latent variable of the i-th layer node into The reconstructed value, the embedded feature of the i-2th layer node, the embedded feature of the i-1th layer node, and the reconstructed value of the latent variable of the i-1th layer node are input to the second decoding network of the i-th layer network, thereby generating The geometric information of the i-th layer node.
- the probability parameters of the i-th layer nodes can be determined first; then the geometric information of the i-th layer nodes can be further generated according to the probability parameters of the i-th layer.
- the embedding features of the i-2th layer node, the embedding features of the i-1th layer node, and the reconstructed values of the latent variables of the i-1th layer node can be obtained first.
- the embedded features of the i-2th layer node can be obtained by the second embedding network of the two-layer network below the current layer, that is, the i-2th layer node can be output by the second embedding network in the i-2th layer network.
- the embedded features of the layer node; the embedded features of the i-1th layer node can be obtained by the second embedding network of the next layer network of the current layer, that is, it can be output by the second embedding network in the i-1th layer network.
- the embedded features of the i-1th layer node; the reconstructed value of the latent variable of the i-1th layer node can be obtained from the network of the next layer of the current layer.
- the second decoding network may include a sparse convolution network, a deconvolution layer, an initial residual network, a feature fusion network, and a ReLU activation function.
- the network structure of the second decoding network may be a Concatenate network followed by two sparse convolution layers with a convolution kernel size of 3 and a step size of 1. , followed by an autodecoder, such as Binary AE.
- e (l-1) represents the embedding feature of the l-1 layer node output by the second embedding network
- e (l-2) represents the embedding feature output by the second embedding network.
- the decoder can continue to input the geometric information of the i-th layer node to the second embedding network, so that the i-th layer node can be determined embedded features.
- the second embedding network may be an occupied embedding network embedding.
- the second embedding network can be used to map the octree nodes from the 256-dimensional one-hot vector to the 16-dimensional continuous feature space, so that similar occupancy conditions in the space are also similar in the feature space.
- the second embedding network may include a sparse convolutional network.
- the network structure of the second embedding network may be composed of a multi-layer sparse convolutional network.
- the network structure of the second embedding network may be a three-layer sparse convolutional network.
- the decoder after the decoder inputs the geometric information of the i-th layer node in the point cloud to the second embedding network, it can output the embedding feature of the i-th layer node.
- the embedded features of the i-th layer node can determine the occupancy status of the i-th layer node after it is mapped to the feature space.
- the embedded characteristics of the i-th layer node can determine the occupancy status of the i-th layer node.
- x (l) represents the geometric information of the l-th layer node
- e (l) represents the embedding feature of the l-th layer node output by the second embedding network
- the decoder can input the geometric information x (l) of the l-th layer node into the second embedding network, so that the corresponding embedding feature e (l) of the l-th layer node can be obtained through the second embedding network.
- Figure 18 is a schematic diagram 2 of the implementation flow of the decoding method proposed in the embodiment of the present application.
- the i-th layer is determined based on the reconstructed value of the residual of the i-th layer node.
- the decoding method of the decoder may also include the following steps:
- Step 204 input the reconstructed value of the latent variable of the i-th layer node into the second decoding network to determine the predicted value of the latent variable of the i+1-th layer node.
- the decoder can further input the reconstructed value of the latent variable of the i-th layer node into The second decoding network can determine the predicted value of the latent variable of the i+1th layer node.
- the second embedding network in the i-th layer network to the first layer network has already The output of the corresponding embedded features is completed.
- the i-1th layer network to the first layer network have completed the output of the corresponding reconstructed values of the latent variables.
- the input of the second decoding network of the i-th layer network may include: the embedded features of the i-1-th layer node output by the second embedding network of the i-1-th layer network, the i-th layer node The reconstructed value of the latent variable of the i-1th layer node obtained by the -1 layer network; the input of the second decoding network of the i-th layer network may also include the reconstructed value of the latent variable of the i-th layer node and the i-th embedded feature.
- the decoder when the decoder determines the predicted value of the latent variable of the i+1-th layer node based on the reconstructed value of the latent variable of the i-th layer node, the decoder may convert the latent variable of the i-th layer node into The reconstructed value of the variable, the embedded feature of the i-th layer node, the embedded feature of the i-1th layer node, and the reconstructed value of the latent variable of the i-1th layer node are input to the second decoding network of the i-th layer network, so that it can be determined The predicted value of the latent variable of the i+1th layer node.
- the embedding features of the i-1th layer node and the reconstructed values of the latent variables of the i-1th layer node can be obtained first.
- the embedded features of the i-1th layer node can be obtained by the second embedding network of the next layer network of the current layer, that is, the i-1th layer output can be obtained by the second embedding network in the i-1th layer network.
- Embedding features of layer nodes; the reconstructed value of the latent variable of the i-1th layer node can be obtained from the network of the next layer of the current layer.
- the second decoding network may include a sparse convolution network, a deconvolution layer, an initial residual network, a feature fusion network, and a ReLU activation function.
- the network structure of the second decoding network may be a sparse convolution layer with a convolution kernel size of 2 and a step size of 2 followed by the Concatenate network. , followed by three initial residual networks, followed by a sparse convolution layer with a convolution kernel size of 3 and a stride of 1.
- e (l) represents the embedding feature of the l-th layer node output by the second embedding network
- e (l-1) represents the l-1 th layer output by the second embedding network Embedding features of layer nodes.
- two different branches can be output based on different inputs.
- One branch is the geometric information of the i-th layer node of the octree
- the other branch is the geometric information of the i-th layer node of the octree.
- the branch is the predicted value of the latent variable of the i+1th level node of the octree.
- the predicted value of the latent variable of the i+1th layer node of the octree can be input into the i+1th layer network for performing the i+1th layer node calculation. Determination of reconstructed values of latent variables.
- the decoder determines the reconstructed value of the latent variable of the first layer node using the reconstructed value of the residual of the first layer node obtained by decoding the code stream; then the reconstructed value of the latent variable of the first layer node and the preset embedded feature can be output to the second decoding unit of the first layer network, and then the geometric information of the first layer node and the predicted value of the latent variable of the second layer node can be output respectively.
- the first level may be a level in an octree in which all leaf nodes are empty.
- the last level of the octree that cannot be further divided is the first level.
- the first level may also be a level in an octree that is divided into minimum units, such as divided into minimum unit blocks of 1x1x1. That is to say, when constructing an octree for a point cloud, the last level of the octree divided into the smallest unit block is the first level.
- the decoder may use a factorized variational auto-encoder style entropy codec to compress the residual r (l) .
- the finally received code streams may be composed of code streams corresponding to at least one layer of the multi-layer network.
- the corresponding code stream includes the l-th layer node generated by the factorial variation autoencoder.
- the code stream is the code stream of the geometric information x (l) of the l-th layer node generated by the arithmetic encoder, and the structural information.
- the decoding method proposed in steps 201 to 204 above can rely on the octree structure, use the built embedding network and decoding network, and introduce latent variables to capture the interactions between sibling nodes in the same layer in the octree
- the correlation can achieve higher compression performance; at the same time, parallel encoding and decoding processing based on the level of the octree can effectively shorten the encoding and decoding time, greatly improve the encoding and decoding efficiency, and further improve the compression performance.
- Embodiments of the present application provide a decoding method.
- the decoder includes a second decoding network.
- the decoder decodes the code stream and determines the reconstructed value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; Among them, i is greater than 2; determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node into the second decoding network to determine the i-th layer node. Geometry information of layer nodes.
- latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
- the decoding method proposed in the embodiment of this application can be understood as an end-to-end LIDAR point cloud depth entropy model, or as an end-to-end self-supervised dynamic point cloud compression technology based on deep learning.
- it relies on the octree structure, introduces latent variables to capture the correlation of octree sibling nodes, and uses sparse convolution to construct the network, thus achieving the optimal effect in the LIDAR point cloud lossless entropy model.
- the decoding method proposed in the embodiment of this application can improve the encoding and decoding efficiency of LIDAR point cloud.
- the embodiments of the present application have a lower encoding and decoding time. This is because the encoding and decoding of each node in the embodiments of the present application are processed in parallel on a layer basis, and efficient sparse convolution is used to construct the network. Therefore, it has Much lower encoding and decoding time compared to common encoding and decoding schemes.
- the network that performs the decoding method proposed in the embodiment of the present application can make full use of hierarchical latent variables to capture LIDAR point cloud correlation, and can use residual coding to decompose and compress the latent variables. Furthermore, it can also The soft addition/subtraction network is used to replace the original hard addition/subtraction, which improves the flexibility of the network.
- FIG19 is a schematic diagram of an overall framework for executing a decoding method.
- the decoder may be provided with a multi-layer network, wherein at least one layer of the multi-layer network may include a second embedding network (embedding) and a second decoding network (decoder).
- each layer is composed of a decoding network (decoder) and an embedding network (embedding).
- the l-th layer network in the multi-layer network can perform parallel decoding processing on the geometric information of the l-th layer nodes.
- x (l) represents the geometric information of the l-th layer node
- e (l) represents the embedded feature of the l-th layer node after occupying the embedded network
- f (l) represents the latent variable of the l-th layer node.
- the + symbol in the figure represents a soft additive network (additive network).
- the network provided in the decoder may be composed of a 5-layer network.
- the geometric information x (l) of the l-th layer node passes through the embedding network to obtain the occupancy embedding e (l) (embedded feature).
- the decoding network is divided into two paths, which generate the current layer x (l) of the octree and the predicted value of the latent variable of the node in the previous layer.
- the decoder can first determine the reconstructed value of the latent variable of the node in the first layer. Then the reconstructed value of the latent variable of the first layer node and the preset embedded feature e (l-6) can be output to the second decoding unit of the first layer network, and then the geometric information of the first layer node and the second decoding unit can be output respectively.
- the input of the second decoding network of the l-1 layer network may include: the embedded features of the l-3 layer nodes output by the second embedding network of the l-3 layer network, the embedded features of the l-2 layer nodes output by the second embedding network of the l-2 layer network, and the reconstructed values of the latent variables of the l-2 layer nodes obtained by the l-2 layer network; the input of the second decoding network of the l-1 layer network may also include the reconstructed values of the latent variables of the l-1 layer nodes.
- the decoder when the decoder generates the geometric information of the l-1 layer nodes, it may first input the reconstructed values of the latent variables of the l-1 layer nodes, the embedded features of the l-3 layer nodes, the embedded features of the l-2 layer nodes, and the reconstructed values of the latent variables of the l-2 layer nodes into the second decoding network of the l-1 layer network, thereby generating the geometric information of the l-1 layer nodes.
- the decoder determines the predicted value of the latent variable of the l-th layer node, the reconstructed value of the latent variable of the l-1th layer node, the l-1th layer node
- the embedded features of the layer node, the embedded features of the l-2th layer node, and the reconstructed value of the latent variable of the l-2th layer node are input to the second decoding network of the l-1th layer network, so that the l-th layer node can be determined. Predictive value of the latent variable.
- the subsequent l-th layer node can be During the decoding process, the predicted value of the latent variable of the l-th layer node is used, and the additive network is used to determine the reconstructed value of the latent variable of the l-th layer node.
- the decoding network (decoder) is used to generate the predicted value of the latent variable of the current layer x (l) of the octree and the previous layer node.
- One branch includes: obtaining e (l-1) , e (l-2) can be connected through the Concatenate network After e (l-1) and e (l-2) are concatenated, a two-layer sparse convolutional network is used to predict the probability p (l) of each node in the lth layer, and finally an arithmetic encoder is used to entropy decode and generate x (l) .
- Another branch includes: e (l) , e (l-1) can be used in the Concatenate network. After e (l-1) and e (l-2) are concatenated, they are passed through a deconvolution layer with a convolution kernel size of 2 and a step size of 2, and then the predicted value of the latent variable of the l+1th layer node is obtained through the initial residual network.
- the embedding network can be used to map the octree nodes from the 256-dimensional one-hot vector to the 16-dimensional continuous feature space, so that they occupy similar spaces in the space. The situation is similar in feature space.
- At least one layer of the multi-layer network set up by the decoder may also include an additive network.
- the soft addition network additive network
- the soft addition network is used to replace the original hard addition, which greatly improves the flexibility of the network.
- the decoding method proposed in the embodiment of this application can rely on the octree structure, use the built embedding network, encoding network and decoding network, and introduce latent variables to capture the differences between sibling nodes in the same layer in the octree
- the correlation can achieve higher compression performance; at the same time, parallel encoding and decoding processing based on the level of the octree can effectively shorten the encoding and decoding time, greatly improve the encoding and decoding efficiency, and further improve the compression performance.
- Embodiments of the present application provide a decoding method.
- the decoder includes a second decoding network.
- the decoder decodes the code stream and determines the reconstructed value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; Among them, i is greater than 2; determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node into the second decoding network to determine the i-th layer node. Geometry information of layer nodes.
- latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
- Figure 20 is a schematic structural diagram of an encoder.
- the encoder 10 may include: a first determination Unit 11, encoding unit 12, generation unit 13; where,
- the first determination unit 11 is configured to input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network, and determine the embedding characteristics of the i-th layer node; wherein i is greater than 2 ; Input the embedded features of the i-th layer node into the encoding network to determine the latent variable of the i-th layer node; determine the reconstructed value of the residual of the i-th layer node based on the latent variable of the i-th layer node , and determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node;
- the encoding unit 12 is configured to write the reconstructed value of the residual of the i-th layer node into the bitstream;
- the generating unit 13 is configured to input the reconstructed value of the latent variable of the i-th layer node to the first decoding network and generate a code stream of the geometric information of the i-th layer node.
- the "unit" may be part of a circuit, part of a processor, part of a program or software, etc., and of course may also be a module, or may be non-modular.
- each component in this embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
- the above integrated units can be implemented in the form of hardware or software function modules.
- the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of this embodiment is essentially either The part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium and includes a number of instructions to make a computer device (can It is a personal computer, server, or network device, etc.) or processor that executes all or part of the steps of the method described in this embodiment.
- the aforementioned storage media include: U disk, mobile hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk or optical disk and other media that can store program code.
- embodiments of the present application provide a computer-readable storage medium for use in the encoder 10.
- the computer-readable storage medium stores a computer program.
- the computer program is executed by the first processor, any of the foregoing embodiments can be implemented. The method described in one item.
- Figure 21 is a schematic diagram 2 of the composition structure of the encoder.
- the encoder 10 may include: a first memory 14 and a first processor 15. Communication interface 16 and first bus system 17 .
- the first memory 14 , the first processor 15 , and the first communication interface 16 are coupled together through a first bus system 17 .
- the first bus system 17 is used to realize connection communication between these components.
- the first bus system 17 also includes a power bus, a control bus and a status signal bus.
- the various buses are labeled as first bus system 17 in FIG. 9 . in,
- the first communication interface 16 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
- the first memory 14 is used to store computer programs that can run on the first processor
- the first processor 15 is configured to input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network when running the computer program, and determine the i-th layer The embedded features of the node; where i is greater than 2; input the embedded features of the i-th layer node into the encoding network to determine the latent variable of the i-th layer node; determine based on the latent variable of the i-th layer node The reconstructed value of the residual of the i-th layer node, and determine the reconstructed value of the latent variable of the i-th layer node according to the reconstructed value of the residual of the i-th layer node; The reconstructed value of the variable is input to the first decoding network to generate a code stream of the geometric information of the i-th layer node.
- the first memory 14 in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
- non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
- Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
- RAM static random access memory
- DRAM dynamic random access memory
- DRAM synchronous dynamic random access memory
- SDRAM double data rate synchronous dynamic random access memory
- Double Data Rate SDRAM DDRSDRAM
- enhanced SDRAM ESDRAM
- Synchlink DRAM SLDRAM
- Direct Rambus RAM DRRAM
- the first memory 14 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
- the first processor 15 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the first processor 15 .
- the above-mentioned first processor 15 can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA). or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
- the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
- the storage medium is located in the first memory 14.
- the first processor 15 reads the information in the first memory 14 and completes the steps of the above method in combination with its hardware.
- the embodiments described in this application can be implemented using hardware, software, firmware, middleware, microcode, or a combination thereof.
- the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (FPGA), general-purpose processor, controller, microcontroller, microprocessor, and other devices used to perform the functions described in this application electronic unit or combination thereof.
- ASIC Application Specific Integrated Circuits
- DSP Digital Signal Processing
- DSP Device Digital Signal Processing Device
- DSPD Digital Signal Processing Device
- PLD programmable Logic Device
- FPGA Field-Programmable Gate Array
- the technology described in this application can be implemented through modules (such as procedures, functions, etc.) that perform the functions described in this application.
- Software code may be stored in memory and executed by a processor.
- the memory can be implemented in the processor or external to the processor.
- the first processor 15 is further configured to perform the method described in any one of the preceding embodiments when running the computer program.
- FIG. 22 is a schematic diagram of a structure of a decoder.
- the decoder 20 may include: a decoding unit 21 and a second determining unit 22; wherein:
- the decoding unit 21 is configured to decode the code stream
- the second determination unit 22 is configured to determine the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; wherein i is greater than 2; according to the reconstruction value of the residual of the i-th layer node Determine the reconstructed value of the latent variable of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node to the second decoding network to determine the geometric information of the i-th layer node.
- a "unit" can be a part of a circuit, a part of a processor, a part of a program or software, etc., and of course it can also be a module, or it can be non-modular.
- the components in this embodiment can be integrated into a processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or in the form of a software functional module.
- the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of this embodiment is essentially either The part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium and includes a number of instructions to make a computer device (can It is a personal computer, server, or network device, etc.) or processor that executes all or part of the steps of the method described in this embodiment.
- the aforementioned storage media include: U disk, mobile hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk or optical disk and other media that can store program code.
- embodiments of the present application provide a computer-readable storage medium for use in the decoder 20 .
- the computer-readable storage medium stores a computer program.
- the computer program is executed by the first processor, any of the foregoing embodiments can be implemented. The method described in one item.
- Figure 23 is a schematic diagram 2 of the composition of the decoder.
- the decoder 20 may include: a second memory 23 and a second processor 24. Communication interface 25 and second bus system 26 .
- the second memory 23, the second processor 24, and the second communication interface 25 are coupled together through a second bus system 26.
- the second bus system 26 is used to realize connection communication between these components.
- the second bus system 26 also includes a power bus, a control bus and a status signal bus.
- the various buses are labeled as second bus system 26 in FIG. 11 . in,
- the second communication interface 25 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
- the second memory 23 is used to store computer programs that can run on the second processor
- the second processor 24 is used to decode the code stream when running the computer program and determine the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; wherein, i is greater than 2; according to The reconstructed value of the residual of the i-th layer node determines the reconstructed value of the latent variable of the i-th layer node; the reconstructed value of the latent variable of the i-th layer node is input to the second decoding network, and the reconstructed value of the latent variable of the i-th layer node is determined.
- the geometric information of the i-th layer node is used to decode the code stream when running the computer program and determine the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; wherein, i is greater than 2; according to The reconstructed value of the residual of the i-th layer node determines the reconstructed value of the latent variable of
- the second memory 23 in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
- non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
- Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
- RAM static random access memory
- DRAM dynamic random access memory
- DRAM synchronous dynamic random access memory
- SDRAM double data rate synchronous dynamic random access memory
- Double Data Rate SDRAM DDRSDRAM
- enhanced SDRAM ESDRAM
- Synchlink DRAM SLDRAM
- Direct Rambus RAM DRRAM
- the second memory 23 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
- the second processor 24 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the second processor 24 .
- the above-mentioned second processor 24 can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA). or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
- the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
- the storage medium is located in the second memory 23.
- the second processor 24 reads the information in the second memory 23 and completes the steps of the above method in combination with its hardware.
- the embodiments described in this application can be implemented using hardware, software, firmware, middleware, microcode, or a combination thereof.
- the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (FPGA), general-purpose processor, controller, microcontroller, microprocessor, and other devices used to perform the functions described in this application electronic unit or combination thereof.
- ASIC Application Specific Integrated Circuits
- DSP Digital Signal Processing
- DSP Device Digital Signal Processing Device
- DSPD Digital Signal Processing Device
- PLD programmable Logic Device
- FPGA Field-Programmable Gate Array
- the technology described in this application can be implemented through modules (such as procedures, functions, etc.) that perform the functions described in this application.
- Software code may be stored in memory and executed by a processor.
- the memory can be implemented in the processor or external to the processor.
- Embodiments of the present application provide an encoder and a decoder.
- latent variables can be used to represent the space between nodes at a unified level in the octree. Correlation can ultimately achieve higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding is performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
- Embodiments of the present application provide an encoding and decoding method, an encoder, a decoder, and a storage medium.
- the encoder includes a first embedding network, an encoding network, and a first decoding network.
- the encoder converts point clouds into octrees.
- the geometric information of the i-th layer node obtained by the division is input to the first embedding network to determine the embedding characteristics of the i-th layer node; where i is greater than 2; the embedding characteristics of the i-th layer node are input to the encoding network to determine the i-th layer node of the latent variable; determine the reconstruction value of the residual of the i-th layer node based on the latent variable of the i-th layer node, and determine the reconstruction value of the latent variable of the i-th layer node based on the reconstruction value of the residual of the i-th layer node; The reconstructed value of the latent variable of the i-layer node is input to the first decoding network to generate a code stream of the geometric information of the i-th layer node.
- the decoder includes a second decoding network.
- the decoder decodes the code stream and determines the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; where i is greater than 2; according to the i-th layer node
- the reconstructed value of the residual determines the reconstructed value of the latent variable of the i-th layer node; the reconstructed value of the latent variable of the i-th layer node is input to the second decoding network to determine the geometric information of the i-th layer node.
- latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Image Processing (AREA)
Abstract
Description
本申请实施例涉及点云压缩技术领域,尤其涉及一种编解码方法、编码器、解码器以及存储介质。The embodiments of the present application relate to the field of point cloud compression technology, and in particular, to a coding and decoding method, an encoder, a decoder, and a storage medium.
目前,OctSqueeze是一种常见的应用于激光雷达(Laser Radar,LIDAR)点云熵模型,其中,OctSqueeze为基于祖先节点的LIDAR点云八叉树熵模型,OctAttention是对OctSqueeze技术的拓展,是一种基于祖先兄弟节点的LIDAR点云八叉树熵模型。Currently, OctSqueeze is a common point cloud entropy model used in Laser Radar (LIDAR). Among them, OctSqueeze is a LIDAR point cloud octree entropy model based on ancestor nodes, and OctAttention is an expansion of OctSqueeze technology. A LIDAR point cloud octree entropy model based on ancestor sibling nodes.
然而,OctSqueeze方案是以假设邻居节点在已知父节点的情况下条件独立,这通常是错误的假设,会影响编解码效率。而OctAttention方案中,由于上下文窗口在解码过程中不断滑动,因此需要不断运行模型以更新上下文窗口的语义信息,此过程不具备并行性,因而具有极高的时间复杂度。However, the OctSqueeze scheme assumes that neighbor nodes are conditionally independent when their parent nodes are known. This is usually a wrong assumption and will affect the encoding and decoding efficiency. In the OctAttention scheme, since the context window keeps sliding during the decoding process, the model needs to be continuously run to update the semantic information of the context window. This process is not parallel and therefore has extremely high time complexity.
发明内容Contents of the invention
本申请实施例提供一种编解码方法、编码器、解码器以及存储介质,能够提高点云编解码效率,同时提升压缩性能。The embodiments of the present application provide a coding and decoding method, an encoder, a decoder and a storage medium, which can improve the efficiency of point cloud coding and decoding while improving the compression performance.
本申请实施例的技术方案可以如下实现:The technical solutions of the embodiments of this application can be implemented as follows:
第一方面,本申请实施例提供了一种编码方法,应用于编码器,所述编码器包括第一嵌入网络、编码网络、第一解码网络,所述方法包括:In a first aspect, embodiments of the present application provide an encoding method applied to an encoder. The encoder includes a first embedding network, an encoding network, and a first decoding network. The method includes:
将点云依八叉树划分得到的第i层节点的几何信息输入至所述第一嵌入网络,确定所述第i层节点的嵌入特征;其中,i大于2;Input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network, and determine the embedding characteristics of the i-th layer node; wherein i is greater than 2;
将所述第i层节点的嵌入特征输入至所述编码网络,确定所述第i层节点的潜变量;Input the embedded features of the i-th layer node into the encoding network to determine the latent variables of the i-th layer node;
根据所述第i层节点的潜变量确定所述第i层节点的残差的重建值,并根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;Determine the reconstruction value of the residual of the i-th layer node based on the latent variable of the i-th layer node, and determine the reconstruction of the latent variable of the i-th layer node based on the reconstruction value of the residual of the i-th layer node value;
将所述第i层节点的潜变量的重建值输入至所述第一解码网络,生成所述第i层节点的几何信息的码流。The reconstructed value of the latent variable of the i-th layer node is input to the first decoding network to generate a code stream of the geometric information of the i-th layer node.
第二方面,本申请实施例提供了一种解码方法,应用于解码器,所述解码器包括第二解码网络,所述方法包括:In a second aspect, embodiments of the present application provide a decoding method, applied to a decoder, where the decoder includes a second decoding network, and the method includes:
解码码流,确定点云的第i层节点的残差的重建值;其中,i大于2;Decode the code stream and determine the reconstruction value of the residual of the i-th layer node of the point cloud; where i is greater than 2;
根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;Determine a reconstruction value of a latent variable of the i-th layer node according to the reconstruction value of the residual of the i-th layer node;
将所述第i层节点的潜变量的重建值输入至所述第二解码网络,确定所述第i层节点的几何信息。The reconstructed value of the latent variable of the i-th layer node is input to the second decoding network to determine the geometric information of the i-th layer node.
第三方面,本申请实施例提供了一种编码器,所述编码器包括第一确定单元,编码单元,生成单元;其中,In a third aspect, embodiments of the present application provide an encoder, which includes a first determining unit, a coding unit, and a generating unit; wherein,
所述第一确定单元,配置为将点云依八叉树划分得到的第i层节点的几何信息输入至第一嵌入网络,确定所述第i层节点的嵌入特征;其中,i大于2;将所述第i层节点的嵌入特征输入至编码网络,确定所述第i层节点的潜变量;根据所述第i层节点的潜变量确定所述第i层节点的残差的重建值,并根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;The first determination unit is configured to input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network to determine the embedding features of the i-th layer node; wherein i is greater than 2; input the embedding features of the i-th layer node into the encoding network to determine the latent variables of the i-th layer node; determine the reconstructed value of the residual of the i-th layer node according to the latent variable of the i-th layer node, and determine the reconstructed value of the latent variable of the i-th layer node according to the reconstructed value of the residual of the i-th layer node;
所述编码单元,配置为将所述第i层节点的残差的重建值写入码流;The coding unit is configured to write the reconstructed value of the residual of the i-th layer node into the code stream;
所述生成单元,配置为将所述第i层节点的潜变量的重建值输入至第一解码网络,生成所述第i层节点的几何信息的码流。The generating unit is configured to input the reconstructed value of the latent variable of the i-th layer node to the first decoding network and generate a code stream of the geometric information of the i-th layer node.
第四方面,本申请实施例提供了一种编码器,所述编码器包括第一存储器和第一处理器;其中,In a fourth aspect, embodiments of the present application provide an encoder, which includes a first memory and a first processor; wherein,
所述第一存储器,用于存储能够在所述第一处理器上运行的计算机程序;The first memory is used to store a computer program capable of running on the first processor;
所述第一处理器,用于在运行所述计算机程序时,执行如上所述的编码方法。The first processor is configured to execute the encoding method as described above when running the computer program.
第五方面,本申请实施例提供了一种解码器,所述解码器包括解码单元,第二确定单元;其中,In a fifth aspect, embodiments of the present application provide a decoder, which includes a decoding unit and a second determination unit; wherein,
所述解码单元,配置为解码码流;The decoding unit is configured to decode the code stream;
所述第二确定单元,配置为确定点云依八叉树划分得到的第i层节点的残差的重建值;其中,i大于2;根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;将所述第i层节点的潜变量的重建值输入至第二解码网络,确定所述第i层节点的几何信息。The second determination unit is configured to determine the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; wherein i is greater than 2; determined based on the reconstruction value of the residual of the i-th layer node The reconstructed value of the latent variable of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node to the second decoding network to determine the geometric information of the i-th layer node.
第六方面,本申请实施例提供了一种解码器,所述解码器包括第二存储器和第二处理器;其中,In a sixth aspect, embodiments of the present application provide a decoder, the decoder including a second memory and a second processor; wherein,
所述第二存储器,用于存储能够在所述第二处理器上运行的计算机程序;The second memory is used to store a computer program capable of running on the second processor;
所述第二处理器,用于在运行所述计算机程序时,执行如上所述的解码方法。The second processor is configured to execute the decoding method as described above when running the computer program.
第七方面,本申请实施例提供了一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被执行时实现如上所述的解码方法、或者实现如上所述的编码方法。In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed, it implements the decoding method as described above, or implements the encoding method as described above.
本申请实施例提供了一种编解码方法、编码器、解码器以及存储介质,在编码端,编码器包括第一嵌入网络、编码网络、第一解码网络,编码器将点云依八叉树划分得到的第i层节点的几何信息输入至所述第一嵌入网络,确定所述第i层节点的嵌入特征;其中,i大于2;将所述第i层节点的嵌入特征输入至所述编码网络,确定所述第i层节点的潜变量;根据所述第i层节点的潜变量确定所述第i层节点的残差的重建值,并根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;将所述第i层节点的潜变量的重建值输入至所述第一解码网络,生成所述第i层节点的几何信息的码流。在解码端,解码器包括第二解码网络,解码器解码码流,确定点云的第i层节点的残差的重建值;其中,i大于2;根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;将所述第i层节点的潜变量的重建值输入至所述第二解码网络,确定所述第i层节点的几何信息。由此可见,在本申请的实施例中,在对点云进行编解码时,基于构建的八叉树结构,可以使用潜变量来表征八叉树中的统一层级的节点之间的空间相关性,最终能够获得更高的压缩性能;同时,基于构建的网络对八叉树的相同层级的节点执行并行编解码处理,实现了编解码效率的提高,进一步提升了压缩性能。Embodiments of the present application provide an encoding and decoding method, an encoder, a decoder, and a storage medium. On the encoding side, the encoder includes a first embedding network, an encoding network, and a first decoding network. The encoder converts point clouds into octrees. The geometric information of the i-th layer node obtained by the division is input to the first embedding network, and the embedding characteristics of the i-th layer node are determined; where i is greater than 2; the embedding characteristics of the i-th layer node are input to the Encoding network, determine the latent variable of the i-th layer node; determine the reconstruction value of the residual of the i-th layer node according to the latent variable of the i-th layer node, and determine the reconstruction value of the residual of the i-th layer node according to the The reconstruction value determines the reconstruction value of the latent variable of the i-th layer node; the reconstruction value of the latent variable of the i-th layer node is input to the first decoding network to generate a code of the geometric information of the i-th layer node. flow. At the decoding end, the decoder includes a second decoding network. The decoder decodes the code stream and determines the reconstruction value of the residual of the i-th layer node of the point cloud; where i is greater than 2; according to the residual of the i-th layer node The reconstruction value determines the reconstruction value of the latent variable of the i-th layer node; the reconstruction value of the latent variable of the i-th layer node is input to the second decoding network to determine the geometric information of the i-th layer node. It can be seen that in the embodiments of the present application, when encoding and decoding point clouds, based on the constructed octree structure, latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
图1为本申请实施例提供的一种点云编解码的网络架构示意图;Figure 1 is a schematic diagram of a point cloud encoding and decoding network architecture provided by an embodiment of the present application;
图2为本申请实施例提出的编码方法的实现流程示意图一;Figure 2 is a
图3为嵌入网络的网络结构示意图;Figure 3 is a schematic diagram of the network structure of the embedded network;
图4为嵌入网络的实现示意图;Figure 4 is a schematic diagram of the implementation of the embedded network;
图5为编码网络的网络结构示意图;Figure 5 is a schematic diagram of the network structure of the coding network;
图6为编码网络的实现示意图;Figure 6 is a schematic diagram of the implementation of the encoding network;
图7为减法网络的网络结构示意图;Figure 7 is a schematic diagram of the network structure of the subtraction network;
图8为减法网络的实现示意图;Figure 8 is a schematic diagram of the implementation of the subtraction network;
图9为加法网络的网络结构示意图;FIG9 is a schematic diagram of the network structure of an addition network;
图10为加法网络的实现示意图;Figure 10 is a schematic diagram of the implementation of the addition network;
图11为解码网络的网络结构示意图一;Figure 11 is a schematic diagram of the network structure of the decoding network;
图12为解码网络的实现示意图一;Figure 12 is a schematic diagram of the implementation of the decoding network;
图13为本申请实施例提出的编码方法的实现流程示意图二;FIG13 is a second schematic diagram of the implementation flow of the encoding method proposed in the embodiment of the present application;
图14为解码网络的网络结构示意图二;Figure 14 is a schematic diagram 2 of the network structure of the decoding network;
图15为解码网络的实现示意图二;Figure 15 is the implementation diagram 2 of the decoding network;
图16为执行编码方法的整体框架示意图;Figure 16 is a schematic diagram of the overall framework for executing the encoding method;
图17为本申请实施例提出的解码方法的实现流程示意图一;FIG17 is a schematic diagram of a first implementation flow of a decoding method proposed in an embodiment of the present application;
图18为本申请实施例提出的解码方法的实现流程示意图二;Figure 18 is a
图19为执行解码方法的整体框架示意图;Figure 19 is a schematic diagram of the overall framework for executing the decoding method;
图20为编码器的组成结构示意图一;FIG20 is a schematic diagram of the structure of the encoder;
图21为编码器的组成结构示意图二;Figure 21 is a schematic diagram 2 of the structure of the encoder;
图22为解码器的组成结构示意图一;Figure 22 is a schematic diagram of the structure of the decoder;
图23为解码器的组成结构示意图二。Figure 23 is a schematic diagram 2 of the structure of the decoder.
为了能够更加详尽地了解本申请实施例的特点与技术内容,下面结合附图对本申请实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本申请实施例。In order to understand the characteristics and technical content of the embodiments of the present application in more detail, the implementation of the embodiments of the present application will be described in detail below with reference to the accompanying drawings. The attached drawings are for reference only and are not intended to limit the embodiments of the present application.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application and are not intended to limit the present application.
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
还需要指出,本申请实施例所涉及的术语“第一\第二\第三”仅是用于区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。It should also be pointed out that the terms "first\second\third" involved in the embodiments of this application are only used to distinguish similar objects and do not represent a specific ordering of objects. It is understandable that "first\second\ The third "specific order or sequence may be interchanged where permitted, so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.
点云(Point Cloud)是物体表面的三维表现形式,通过光电雷达、激光雷达、激光扫描仪、多视角相机等采集设备,可以采集得到物体表面的点云(数据)。Point Cloud is a three-dimensional representation of the surface of an object. Through collection equipment such as photoelectric radar, lidar, laser scanner, and multi-view camera, the point cloud (data) of the surface of the object can be collected.
点云是空间中一组无规则分布的、表达三维物体或场景的空间结构及表面属性的离散点集,图1A展示了三维点云图像和图1B展示了三维点云图像的局部放大图,可以看到点云表面是由分布稠密的点所组成的。Point cloud is a set of discrete points randomly distributed in space that expresses the spatial structure and surface properties of a three-dimensional object or scene. Figure 1A shows a three-dimensional point cloud image and Figure 1B shows a partial enlargement of the three-dimensional point cloud image. It can be seen that the point cloud surface is composed of densely distributed points.
二维图像在每一个像素点均有信息表达,分布规则,因此不需要额外记录其位置信息;然而点云中的点在三维空间中的分布具有随机性和不规则性,因此需要记录每一个点在空间中的位置,才能完整地表达一幅点云。与二维图像类似,采集过程中每一个位置均有对应的属性信息,通常为RGB颜色值,颜色值反映物体的色彩;对于点云来说,每一个点所对应的属性信息除了颜色信息以外,还有比较常见的是反射率(reflectance)值,反射率值反映物体的表面材质。因此,点云中的点可以包括点的位置信息和点的属性信息。例如,点的位置信息可以是点的三维坐标信息(x,y,z)。点的位置信息也可称为点的几何信息。例如,点的属性信息可以包括颜色信息(三维颜色信息)和/或反射率(一维反射率信息r)等等。例如,颜色信息可以是任意一种色彩空间上的信息。例如,颜色信息可以是RGB信息。其中,R表示红色(Red,R),G表示绿色(Green,G),B表示蓝色(Blue,B)。再如,颜色信息可以是亮度色度(YCbCr,YUV)信息。其中,Y表示明亮度(Luma),Cb(U)表示蓝色色差,Cr(V)表示红色色差。Two-dimensional images have information expression at each pixel point, and the distribution is regular, so there is no need to record its position information additionally; however, the distribution of points in point clouds in three-dimensional space is random and irregular, so it is necessary to record the position of each point in space in order to fully express a point cloud. Similar to two-dimensional images, each position in the acquisition process has corresponding attribute information, usually RGB color values, and the color value reflects the color of the object; for point clouds, in addition to color information, the attribute information corresponding to each point is also commonly the reflectance value, which reflects the surface material of the object. Therefore, the points in the point cloud can include the position information of the point and the attribute information of the point. For example, the position information of the point can be the three-dimensional coordinate information (x, y, z) of the point. The position information of the point can also be called the geometric information of the point. For example, the attribute information of the point can include color information (three-dimensional color information) and/or reflectance (one-dimensional reflectance information r), etc. For example, the color information can be information on any color space. For example, the color information can be RGB information. Among them, R represents red (Red, R), G represents green (Green, G), and B represents blue (Blue, B). For another example, the color information may be luminance and chrominance (YCbCr, YUV) information, where Y represents brightness (Luma), Cb (U) represents blue color difference, and Cr (V) represents red color difference.
根据激光测量原理得到的点云,点云中的点可以包括点的三维坐标信息和点的反射率值。再如,根据摄影测量原理得到的点云,点云中的点可以可包括点的三维坐标信息和点的三维颜色信息。再如,结合激光测量和摄影测量原理得到点云,点云中的点可以可包括点的三维坐标信息、点的反射率值和点的三维颜色信息。According to the point cloud obtained based on the principle of laser measurement, the points in the point cloud can include the three-dimensional coordinate information of the point and the reflectivity value of the point. For another example, in a point cloud obtained according to the principle of photogrammetry, the points in the point cloud may include the three-dimensional coordinate information of the point and the three-dimensional color information of the point. For another example, a point cloud is obtained by combining the principles of laser measurement and photogrammetry. The points in the point cloud may include the three-dimensional coordinate information of the point, the reflectivity value of the point, and the three-dimensional color information of the point.
点云可以按获取的途径分为:Point clouds can be divided into:
静态点云:即物体是静止的,获取点云的设备也是静止的;Static point cloud: that is, the object is stationary and the device that obtains the point cloud is also stationary;
动态点云:物体是运动的,但获取点云的设备是静止的;Dynamic point cloud: The object is moving, but the device that obtains the point cloud is stationary;
动态获取点云:获取点云的设备是运动的。Dynamically acquire point clouds: The device that acquires point clouds is in motion.
例如,按点云的用途分为两大类:For example, point clouds are divided into two categories according to their uses:
类别一:机器感知点云,其可以用于自主导航系统、实时巡检系统、地理信息系统、视觉分拣机器人、抢险救灾机器人等场景;Category 1: Machine perception point cloud, which can be used in scenarios such as autonomous navigation systems, real-time inspection systems, geographic information systems, visual sorting robots, and rescue and disaster relief robots;
类别二:人眼感知点云,其可以用于数字文化遗产、自由视点广播、三维沉浸通信、三维沉浸交互等点云应用场景。Category 2: Human eye perception point cloud, which can be used in point cloud application scenarios such as digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive communication, and three-dimensional immersive interaction.
点云可以灵活方便地表达三维物体或场景的空间结构及表面属性,并且由于点云通过直接对真实物体采样获得,在保证精度的前提下能提供极强的真实感,因而应用广泛,其范围包括虚拟现实游戏、计算机辅助设计、地理信息系统、自动导航系统、数字文化遗产、自由视点广播、三维沉浸远程呈现、生物组织器官三维重建等。Point clouds can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes, and because point clouds are obtained by directly sampling real objects, they can provide a strong sense of reality while ensuring accuracy, so they are widely used and their scope Including virtual reality games, computer-aided design, geographic information systems, automatic navigation systems, digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive telepresence, three-dimensional reconstruction of biological tissues and organs, etc.
点云的采集主要有以下途径:计算机生成、3D激光扫描、3D摄影测量等。计算机可以生成虚拟三维物体及场景的点云;3D激光扫描可以获得静态现实世界三维物体或场景的点云,每秒可以获取百万级点云;3D摄影测量可以获得动态现实世界三维物体或场景的点云,每秒可以获取千万级点云。这些技术降低了点云数据获取成本和时间周期,提高了数据的精度。点云数据获取方式的变革,使大量点云数据的获取成为可能,伴随着应用需求的增长,海量3D点云数据的处理遭遇存储空间和传输带宽限制的瓶颈。Point cloud collection mainly has the following methods: computer generation, 3D laser scanning, 3D photogrammetry, etc. Computers can generate point clouds of virtual three-dimensional objects and scenes; 3D laser scanning can obtain point clouds of static real-world three-dimensional objects or scenes, and can obtain millions of point clouds per second; 3D photogrammetry can obtain dynamic real-world three-dimensional objects or scenes Point clouds can obtain tens of millions of point clouds per second. These technologies reduce the cost and time period of point cloud data acquisition and improve the accuracy of the data. Changes in the way of obtaining point cloud data have made it possible to obtain large amounts of point cloud data. With the growth of application requirements, the processing of massive 3D point cloud data has encountered bottlenecks limited by storage space and transmission bandwidth.
示例性地,以帧率为30帧每秒(fps)的点云视频为例,每帧点云的点数为70万,每个点具有坐标信息xyz(float)和颜色信息RGB(uchar),则10s点云视频的数据量大约为0.7million×(4Byte×3+1Byte×3)×30fps×10s=3.15GB,其中,1Byte为8bit,而YUV采样格式为4:2:0,帧率为24fps的1280×720二维视频,其10s的数据量约为1280×720×12bit×24fps×10s≈0.33GB,10s的两视角三维视频的数据量约为0.33×2=0.66GB。由此可见,点云视频的数据量远超过相同时长的二维视频和三维视频的数据量。因此,为更好地实现数据管理,节省服务器存储空间,降低服务器与客户端之间的传输流量及传输时间,点云压缩成为促进点云产业发展的关键问题。For example, taking a point cloud video with a frame rate of 30 frames per second (fps), the number of points in each frame of the point cloud is 700,000, and each point has coordinate information xyz (float) and color information RGB (uchar), Then the data volume of 10s point cloud video is approximately 0.7 million×(4Byte×3+1Byte×3)×30fps×10s=3.15GB, where 1Byte is 8bit, and the YUV sampling format is 4:2:0, and the frame rate The data volume of 1280×720 2D video at 24fps for 10s is about 1280×720×12bit×24fps×10s≈0.33GB, and the data volume of two-view 3D video for 10s is about 0.33×2=0.66GB. It can be seen that the data volume of point cloud video far exceeds the data volume of 2D video and 3D video of the same length. Therefore, in order to better realize data management, save server storage space, and reduce transmission traffic and transmission time between the server and the client, point cloud compression has become a key issue to promote the development of the point cloud industry.
也就是说,由于点云是海量点的集合,存储点云不仅会消耗大量的内存,而且不利于传输,也没有这么大的带宽可以支持将点云不经过压缩直接在网络层进行传输,因此,需要对点云进行压缩。In other words, since the point cloud is a collection of massive points, storing the point cloud will not only consume a lot of memory, but is also not conducive to transmission. There is not such a large bandwidth to support the direct transmission of the point cloud at the network layer without compression. Therefore, , the point cloud needs to be compressed.
目前,可对点云进行压缩的点云编码框架可以是运动图像专家组(Moving Picture Experts Group,MPEG)提供的基于几何的点云压缩(Geometry-based Point Cloud Compression,G-PCC)编解码框架或基于视频的点云压缩(Video-based Point Cloud Compression,V-PCC)编解码框架,也可以是AVS提供的AVS-PCC编解码框架。G-PCC编解码框架可用于针对第一类静态点云和第三类动态获取点云进行压缩,V-PCC编解码框架可用于针对第二类动态点云进行压缩。G-PCC编解码框架也称为点云编解码器TMC13,V-PCC编解码框架也称为点云编解码器TMC2。Currently, the point cloud coding framework that can compress point clouds can be the Geometry-based Point Cloud Compression (G-PCC) codec framework provided by the Moving Picture Experts Group (MPEG) Or the Video-based Point Cloud Compression (V-PCC) codec framework, or the AVS-PCC codec framework provided by AVS. The G-PCC encoding and decoding framework can be used to compress the first type of static point cloud and the third type of dynamic point cloud, and the V-PCC encoding and decoding framework can be used to compress the second type of dynamic point cloud. The G-PCC encoding and decoding framework is also called point cloud codec TMC13, and the V-PCC encoding and decoding framework is also called point cloud codec TMC2.
本申请实施例提供了一种包含解码方法和编码方法的点云编解码系统的网络架构,图1为本申请实施例提供的一种点云编解码的网络架构示意图。如图1所示,该网络架构包括一个或多个电子设备13至1N和通信网络01,其中,电子设备13至1N可以通过通信网络01进行视频交互。电子设备在实施的过程中可以为各种类型的具有点云编解码功能 的设备,例如,所述电子设备可以包括手机、平板电脑、个人计算机、个人数字助理、导航仪、数字电话、视频电话、电视机、传感设备、服务器等,本申请实施例不作限制。其中,本申请实施例中的解码器或编码器就可以为上述电子设备。This embodiment of the present application provides a network architecture of a point cloud encoding and decoding system that includes a decoding method and an encoding method. Figure 1 is a schematic diagram of the network architecture of a point cloud encoding and decoding system provided by an embodiment of the present application. As shown in Figure 1, the network architecture includes one or more
其中,本申请实施例中的电子设备具有点云编解码功能,一般包括点云编码器(即编码器)和点云解码器(即解码器)。Among them, the electronic device in the embodiment of the present application has a point cloud encoding and decoding function, and generally includes a point cloud encoder (ie, encoder) and a point cloud decoder (ie, decoder).
在本申请的实施例中,OctSqueeze为基于祖先节点的LIDAR点云八叉树熵模型,该技术应用于LIDAR点云熵模型,主要由以下部分构成:In the embodiment of this application, OctSqueeze is a LIDAR point cloud octree entropy model based on ancestor nodes. This technology is applied to the LIDAR point cloud entropy model and mainly consists of the following parts:
(1)、编码器(1), encoder
编码器首先对点云进行八叉树构建。之后对于八叉树的每个节点,选取其深度,父节点占用码,坐标作为上下文信息,通过一个多层感知机网络得到其对应的特征。此后进行k次迭代,对于第k次迭代,拼接当前节点的特征与其父节点的特征,并通过第k个多层感知机得到当前节点在第k次迭代的特征。对于每个节点第k次迭代后的特征,通过一个256维的softmax层,得到每种占用码的概率,并使用算术编码与当前占用码的概率来无损编码当前占用码。The encoder first performs octree construction on the point cloud. Then, for each node of the octree, its depth, parent node occupancy code, and coordinates are selected as context information, and its corresponding features are obtained through a multi-layer perceptron network. After that, k iterations are performed. For the k-th iteration, the characteristics of the current node and the characteristics of its parent node are spliced, and the characteristics of the current node at the k-th iteration are obtained through the k-th multi-layer perceptron. For the features after the k-th iteration of each node, the probability of each occupancy code is obtained through a 256-dimensional softmax layer, and the current occupancy code is losslessly encoded using arithmetic coding and the probability of the current occupancy code.
(2)、码流(2) Code stream
OctSqueeze技术的码流由每层节点占用码对应的码流组成。The code stream of OctSqueeze technology consists of the code stream corresponding to the occupied code of each layer of nodes.
(3)、解码器(3), decoder
解码器以层为单位,从低层到高层次序解码八叉树。解码过程与编码过程除节点顺序外完全相同。The decoder decodes the octree sequentially from lower layers to higher layers in units of layers. The decoding process is exactly the same as the encoding process except for the node order.
(4)、损失函数(4), loss function
损失函数为预测节点概率与真实占用码的交叉熵,如下公式:The loss function is the cross entropy of the predicted node probability and the real occupancy code, as follows:
其中,N为八叉树节点数,x i表示第i个节点,j为256维中的维度,p ij则为对应的占用码的概率。 Among them, N is the number of octree nodes, xi represents the i-th node, j is the dimension in 256 dimensions, and p ij is the probability of the corresponding occupancy code.
在本申请的实施例中,OctAttention为基于祖先兄弟节点的LIDAR点云八叉树熵模型,该技术应用于LIDAR点云熵模型,是对OctSqueeze技术的拓展,主要由以下部分构成:In the embodiment of this application, OctAttention is a LIDAR point cloud octree entropy model based on ancestor sibling nodes. This technology is applied to the LIDAR point cloud entropy model and is an expansion of the OctSqueeze technology. It mainly consists of the following parts:
(1)、编码器(1), encoder
该技术按广度优先顺序编解码点云,使用一个上下文窗口,用于储存已编码/解码的节点。对于当前待编码节点,使用基于注意力的网络提取上下文窗口的信息,并使用softmax函数得到当前节点各占用码的概率。使用算术编码无损编码当前节点,并将当前节点加入上下文窗口中。This technique encodes and decodes point clouds in breadth-first order, using a context window to store encoded/decoded nodes. For the current node to be encoded, an attention-based network is used to extract the information of the context window, and the softmax function is used to obtain the probability of each occupied code of the current node. Losslessly encode the current node using arithmetic coding and add the current node to the context window.
(2)、码流(2), code stream
该技术的码流由按广度优先顺序排列的各个节点的码流组成。The code stream of this technology consists of the code streams of each node arranged in breadth-first order.
(3)、解码器(3), decoder
解码过程与编码过程一致。The decoding process is the same as the encoding process.
(4)、损失函数(4), loss function
损失函数为预测节点概率与真实占用码的交叉熵,如下公式:The loss function is the cross entropy of the predicted node probability and the real occupancy code, as follows:
其中,N为八叉树节点数,x i表示第i个节点,j为256维中的维度,p ij则为对应的占用码的概率。 Where N is the number of octree nodes, xi represents the i-th node, j is the dimension in 256 dimensions, and pij is the probability of the corresponding occupied code.
可以理解的是,上述OctSqueeze方案是以假设邻居节点在已知父节点的情况下条件独立,这通常是错误的假设,会导致编码效率次优。而上述OctAttention方案中,由于上下文窗口在解码过程中不断滑动,因此需要不断运行模型以更新上下文窗口的语义信息,此过程不具备并行性,因此具有极高的时间复杂度。It is understandable that the above-mentioned OctSqueeze scheme assumes that neighbor nodes are conditionally independent when the parent node is known. This is usually a wrong assumption and will lead to suboptimal coding efficiency. In the above OctAttention scheme, since the context window keeps sliding during the decoding process, the model needs to be continuously run to update the semantic information of the context window. This process is not parallel and therefore has extremely high time complexity.
由此可见,常见的点云编解码方法,存在复杂度高的缺陷,影响了压缩性能,同时在 一定程度上降低了编解码效率。It can be seen that common point cloud encoding and decoding methods have the disadvantage of high complexity, which affects the compression performance and reduces the encoding and decoding efficiency to a certain extent.
为了解决上述问题,本申请实施例提供了一种编解码方法、编码器、解码器以及存储介质,在编码端,编码器包括第一嵌入网络、编码网络、第一解码网络,编码器将点云依八叉树划分得到的第i层节点的几何信息输入至第一嵌入网络,确定第i层节点的嵌入特征;其中,i大于2;将第i层节点的嵌入特征输入至编码网络,确定第i层节点的潜变量;根据第i层节点的潜变量确定第i层节点的残差的重建值,并根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值;将第i层节点的潜变量的重建值输入至第一解码网络,生成第i层节点的几何信息的码流。在解码端,解码器包括第二解码网络,解码器解码码流,确定点云依八叉树划分得到的第i层节点的残差的重建值;其中,i大于2;根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值;将第i层节点的潜变量的重建值输入至第二解码网络,确定第i层节点的几何信息。由此可见,在本申请的实施例中,在对点云进行编解码时,基于构建的八叉树结构,可以使用潜变量来表征八叉树中的统一层级的节点之间的空间相关性,最终能够获得更高的压缩性能;同时,基于构建的网络对八叉树的相同层级的节点执行并行编解码处理,实现了编解码效率的提高,进一步提升了压缩性能。In order to solve the above problems, embodiments of the present application provide a coding and decoding method, an encoder, a decoder, and a storage medium. On the encoding side, the encoder includes a first embedding network, a coding network, and a first decoding network. The encoder will The geometric information of the i-th layer node obtained by Yuanyi octree division is input to the first embedding network to determine the embedding characteristics of the i-th layer node; where i is greater than 2; the embedding characteristics of the i-th layer node are input to the encoding network, Determine the latent variable of the i-th layer node; determine the reconstructed value of the residual of the i-th layer node based on the latent variable of the i-th layer node, and determine the reconstructed value of the residual of the i-th layer node based on the reconstructed value of the residual of the i-th layer node. Reconstruction value; input the reconstruction value of the latent variable of the i-th layer node to the first decoding network to generate a code stream of the geometric information of the i-th layer node. At the decoding end, the decoder includes a second decoding network. The decoder decodes the code stream and determines the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; where i is greater than 2; according to the i-th layer node The reconstructed value of the residual determines the reconstructed value of the latent variable of the i-th layer node; the reconstructed value of the latent variable of the i-th layer node is input to the second decoding network to determine the geometric information of the i-th layer node. It can be seen that in the embodiments of the present application, when encoding and decoding point clouds, based on the constructed octree structure, latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
下面将结合附图对本申请各实施例进行详细说明。Each embodiment of the present application will be described in detail below with reference to the accompanying drawings.
本申请的实施例提出一种点云编码方法,该编码方法可以应用于编码器,其中,编码器中可以包括有第一嵌入网络、编码网络以及第一解码网络。The embodiment of the present application proposes a point cloud encoding method, which can be applied to an encoder, where the encoder can include a first embedding network, an encoding network, and a first decoding network.
需要说明的是,在本申请的实施例中,编码器可以设置有多层网络,其中,多层网络中的至少一层网络可以包括第一嵌入网络、编码网络、第一解码网络。It should be noted that in embodiments of the present application, the encoder may be provided with a multi-layer network, where at least one layer of the multi-layer network may include a first embedding network, an encoding network, and a first decoding network.
示例性的,在本申请的实施例中,编码器可以设置有5层网络,其中,5层网络中的其中一层网络或者多层网络,可以包括有第一嵌入网络、编码网络以及第一解码网络。Exemplarily, in the embodiment of the present application, the encoder may be provided with a 5-layer network, where one layer of the network or a multi-layer network in the 5-layer network may include a first embedding network, a coding network and a first Decoding network.
图2为本申请实施例提出的编码方法的实现流程示意图一,如图2所示,编码器进行编码处理的方法可以包括以下步骤:Figure 2 is a schematic flow chart of the implementation of the encoding method proposed by the embodiment of the present application. As shown in Figure 2, the method for the encoder to perform encoding processing may include the following steps:
步骤101、将点云依八叉树划分得到的第i层节点的几何信息输入至第一嵌入网络,确定第i层节点的嵌入特征;其中,i大于2。Step 101: Input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network to determine the embedding characteristics of the i-th layer node; where i is greater than 2.
在本申请的实施例中,编码器可以先将点云依八叉树划分得到的第i层节点的几何信息输入至第一嵌入网络,从而可以确定第i层节点的嵌入特征。In the embodiment of the present application, the encoder can first input the geometric information of the i-th layer node obtained by dividing the point cloud according to the octree into the first embedding network, so that the embedding characteristics of the i-th layer node can be determined.
需要说明的是,在本申请的实施例中,i可以为大于2的整数。It should be noted that, in the embodiment of the present application, i may be an integer greater than 2.
可以理解的是,在本申请的实施例中,多层网络中的第i层网络可以对第i层节点的几何信息进行并行编码处理。It can be understood that, in the embodiment of the present application, the i-th layer network in the multi-layer network can perform parallel encoding processing on the geometric information of the i-th layer node.
进一步地,在本申请的实施例中,在对点云进行几何编码处理时,可以先进行八叉树的构建。其中,采用八叉树结构对点云空间进行递归划分,划分成八个相同大小的子块,并判断每个子块的占有码字情况,当子块内不包含点时记为空,否则记为非空,在递归划分的最后一层记录所有块的占有码字信息,并进行几何编码;通过八叉树结构表达的几何信息一方面可以进一步形成几何码流,另一方面可以在进行几何重建处理时,将重建后的几何信息作为附加信息用于属性编码。Further, in the embodiment of the present application, when performing geometric encoding processing on the point cloud, an octree may be constructed first. Among them, the octree structure is used to recursively divide the point cloud space into eight sub-blocks of the same size, and the occupied code words of each sub-block are judged. When the sub-block does not contain points, it is recorded as empty, otherwise it is recorded as empty. is non-empty, the occupied codeword information of all blocks is recorded in the last layer of recursive division, and geometric encoding is performed; on the one hand, the geometric information expressed through the octree structure can further form a geometric code stream, on the other hand, it can be geometrically encoded During the reconstruction process, the reconstructed geometric information is used as additional information for attribute encoding.
可以理解的是,在本申请的实施例中,对于构建的八叉树,可以将父节点层定义为高层,将子节点层定义为低层;也可以将父节点层定义为低层,将子节点层定义为高层。It can be understood that in the embodiment of the present application, for the constructed octree, the parent node layer can be defined as a high-level layer and the child node layer as a low-level layer; the parent node layer can also be defined as a low-level layer and the child node layer can be defined as a low-level layer. A layer is defined as a high-level layer.
其中,编码器进行编码处理的编码顺序可以为按照构建的八叉树的根节点向叶节点的顺序。相应的,在解码侧,解码顺序可以为按照构建的八叉树的叶节点向根节点的顺序。例如,构建的八叉树的根节点层为第i层,叶节点层依次为第i-1至第一层,那么编码处理的顺序为从第i层至第一层,而解码处理的顺序与编码相反,即从第一层至第i层。The encoding sequence in which the encoder performs the encoding process may be in the order from the root node to the leaf node of the constructed octree. Correspondingly, on the decoding side, the decoding order may be in the order from the leaf nodes to the root node of the constructed octree. For example, the root node layer of the constructed octree is the i-th layer, and the leaf node layers are from the i-1 to the first layer in sequence. Then the order of encoding processing is from the i-th layer to the first layer, and the order of decoding processing is The opposite of encoding, that is, from the first layer to the i-th layer.
可以理解的是,在本申请的实施例中,点云的第i层节点的几何信息可以表征第i层节点的几何信息,例如节点的位置坐标,也可以表征第i层节点的占有码字情况,即占用 情况(占位信息)。It can be understood that in an embodiment of the present application, the geometric information of the i-th layer nodes of the point cloud can represent the geometric information of the i-th layer nodes, such as the position coordinates of the nodes, and can also represent the occupancy codeword situation of the i-th layer nodes, that is, the occupancy situation (placeholder information).
也就是说,在本申请的实施例中,第i层节点的几何信息既包括了节点的位置坐标信息,也包括了节点的占位信息。That is to say, in the embodiment of the present application, the geometric information of the i-th layer node includes both the position coordinate information of the node and the occupancy information of the node.
可以理解的是,在本申请的实施例中,在编码端,第一嵌入网络的输入为第i层节点的几何信息,其中,作为第一嵌入网络的输入的节点的几何信息可以同时包括节点的位置坐标信息和节点的占位信息;而第一解码网络输出的几何信息的码流,主要包括节点的占位信息的码流,可以包括节点的位置坐标信息的码流,也可以不包括节点的位置坐标信息的码流。It can be understood that in the embodiment of the present application, at the encoding end, the input of the first embedding network is the geometric information of the i-th layer node, wherein the geometric information of the node as the input of the first embedding network may also include nodes The position coordinate information and the placeholder information of the node; and the code stream of the geometric information output by the first decoding network mainly includes the code stream of the placeholder information of the node, which may include the code stream of the node's position coordinate information, or may not include Code stream of node position coordinate information.
由此可见,在本申请的实施例中,嵌入网络和编码网络需要同时参考节点的占位信息和位置坐标信息,而解码网络则是着重生成占位信息的码流。It can be seen that in the embodiment of the present application, the embedding network and the encoding network need to refer to the place information and position coordinate information of the node at the same time, while the decoding network focuses on generating a code stream of the place information.
进一步地,在本申请的实施例中,第一嵌入网络可以为占用嵌入网络embedding。其中,第一嵌入网络可以用于将八叉树节点从256维的one-hot向量映射至16维的连续的特征空间,使得在空间中相近的占用情况在特征空间中也相近。Further, in the embodiment of the present application, the first embedding network may be an occupied embedding network embedding. Among them, the first embedding network can be used to map the octree nodes from the 256-dimensional one-hot vector to the 16-dimensional continuous feature space, so that similar occupancy conditions in the space are also similar in the feature space.
需要说明的是,在本申请的实施例中,第一嵌入网络可以包括稀疏卷积网络。It should be noted that in the embodiment of the present application, the first embedding network may include a sparse convolutional network.
示例性的,在本申请的实施例中,第一嵌入网络的网络结构可以由多层稀疏卷积网络构成。例如,图3为嵌入网络的网络结构示意图,如图3所示,第一嵌入网络的网络结构可以为三层的稀疏卷积网络。For example, in the embodiment of the present application, the network structure of the first embedding network may be composed of a multi-layer sparse convolutional network. For example, Figure 3 is a schematic diagram of the network structure of the embedding network. As shown in Figure 3, the network structure of the first embedding network can be a three-layer sparse convolutional network.
进一步地,在本申请的实施例中,编码器在将点云中的第i层节点的几何信息输入至第一嵌入网络之后,可以输出第i层节点的嵌入特征。其中,第i层节点的嵌入特征可以对第i层节点映射至特征空间后的占用情况进行确定。Further, in embodiments of the present application, after the encoder inputs the geometric information of the i-th layer node in the point cloud to the first embedding network, the encoder can output the embedding feature of the i-th layer node. Among them, the embedded features of the i-th layer node can determine the occupancy status of the i-th layer node after it is mapped to the feature space.
也就是说,在本申请的实施例中,第i层节点的嵌入特征可以对第i层节点的占用情况进行确定。That is to say, in the embodiment of the present application, the embedded characteristics of the i-th layer node can determine the occupancy status of the i-th layer node.
示例性的,在本申请的实施例中,图4为嵌入网络的实现示意图,如图4所示,x (l)代表第l层节点的几何信息,e (l)代表第一嵌入网络输出的第l层节点的嵌入特征,对于第l层节点,编码器可以将第l层节点的几何信息x (l)输入至第一嵌入网络中,从而可以通过第一嵌入网络获得对应的第l层节点的嵌入特征e (l)。 Exemplarily, in the embodiment of the present application, Figure 4 is a schematic diagram of the implementation of the embedding network. As shown in Figure 4, x (l) represents the geometric information of the l-th layer node, and e (l) represents the first embedding network output. The embedded features of the l-th layer node. For the l-th layer node, the encoder can input the geometric information x (l) of the l-th layer node into the first embedding network, so that the corresponding l-th layer node can be obtained through the first embedding network. Embedding features e (l) of layer nodes.
步骤102、将第i层节点的嵌入特征输入至编码网络,确定第i层节点的潜变量。Step 102: Input the embedded features of the i-th layer node into the encoding network to determine the latent variables of the i-th layer node.
在本申请的实施例中,编码器在将点云中的第i层节点的几何信息输入至第一嵌入网络,输出第i层节点的嵌入特征之后,可以进一步将第i层节点的嵌入特征输入至编码网络,确定第i层节点的潜变量。In the embodiment of the present application, after the encoder inputs the geometric information of the i-th layer node in the point cloud to the first embedding network and outputs the embedding feature of the i-th layer node, it can further add the embedding feature of the i-th layer node Input to the encoding network to determine the latent variables of the i-th layer node.
进一步地,在本申请的实施例中,编码网络可以为encoder网络。其中,编码网络可以用于确定当前层的上一层节点的潜变量与当前层节点的嵌入特征之间的空间相关性,从而获得当前层节点的潜变量。即通过编码网络,可以基于第一嵌入网络输出的第i层节点的嵌入特征确定第i层节点的潜变量。Further, in the embodiment of the present application, the encoding network may be an encoder network. Among them, the encoding network can be used to determine the spatial correlation between the latent variables of the previous layer node of the current layer and the embedded features of the current layer node, thereby obtaining the latent variables of the current layer node. That is, through the encoding network, the latent variable of the i-th layer node can be determined based on the embedding characteristics of the i-th layer node output by the first embedding network.
需要说明的是,在本申请的实施例中,第i层节点的潜变量可以表征第i层的兄弟节点之间的相关性。也就是说,在本申请中,编码器可以充分利用层级潜变量捕捉LIDAR点云的相关性。It should be noted that, in the embodiment of the present application, the latent variable of the i-th layer node can represent the correlation between the i-th layer sibling nodes. That is to say, in this application, the encoder can make full use of hierarchical latent variables to capture the correlation of LIDAR point clouds.
需要说明的是,在本申请的实施例中,编码网络可以通过稀疏卷积来实现,具体可以包括稀疏卷积网络、ReLU激活函数、初始残差网络(Inception Resnet,IRN)、特征融合网络Concatenate。It should be noted that in the embodiment of the present application, the encoding network can be implemented through sparse convolution, which may specifically include sparse convolution network, ReLU activation function, initial residual network (Inception Resnet, IRN), and feature fusion network Concatenate .
示例性的,在本申请的实施例中,编码网络的网络结构可以由稀疏卷积网络、ReLU激活函数、初始残差网络、Concatenate网络构成。例如,图5为编码网络的网络结构示意图,如图5所示,编码网络的网络结构可以为一个卷积核大小为2,步长为2的稀疏卷积层,后接ReLU激活函数,再连接三层IRN,在经过Concatenate网络之后,最后接一层稀疏卷积网络。For example, in the embodiment of the present application, the network structure of the encoding network may be composed of a sparse convolution network, a ReLU activation function, an initial residual network, and a Concatenate network. For example, Figure 5 is a schematic diagram of the network structure of the encoding network. As shown in Figure 5, the network structure of the encoding network can be a sparse convolution layer with a convolution kernel size of 2 and a stride of 2, followed by a ReLU activation function, and then Three layers of IRN are connected, and after passing through the Concatenate network, a layer of sparse convolutional network is finally connected.
进一步地,在本申请的实施例中,在第一嵌入网络输出第i层节点的嵌入特征之后,编码器可以继续将第i层节点的嵌入特征输入至编码网络中,进而可以通过编码网络输出第i层节点的潜变量。Furthermore, in an embodiment of the present application, after the first embedding network outputs the embedding features of the i-th layer nodes, the encoder can continue to input the embedding features of the i-th layer nodes into the encoding network, and then the latent variables of the i-th layer nodes can be output through the encoding network.
需要说明的是,在本申请的实施例中,编码网络在利用当前层节点的嵌入特征进行当前层节点的潜变量的确定时,需要结合上一层节点的潜变量。其中,当前层的上一层可以理解为当前节点的父节点层。It should be noted that in the embodiment of the present application, when the coding network uses the embedded features of the current layer node to determine the latent variables of the current layer node, it needs to be combined with the latent variables of the previous layer node. Among them, the previous layer of the current layer can be understood as the parent node layer of the current node.
可以理解的是,在本申请的实施例中,对于当前层的上一层,当上一层中的编码网络确定出对应的潜变量之后,可以将该上一层节点的潜变量输入至当前层的编码网络,从而可以使得当前层的编码网络能够确定上一层节点的潜变量与当前层节点的嵌入特征之间的空间相关性,从而获得当前层节点的潜变量。It can be understood that in the embodiment of the present application, for the layer above the current layer, after the coding network in the previous layer determines the corresponding latent variable, the latent variable of the node in the previous layer can be input to the current layer. The coding network of the current layer can determine the spatial correlation between the latent variables of the previous layer nodes and the embedded features of the current layer nodes, thereby obtaining the latent variables of the current layer nodes.
也就是说,在本申请的实施例中,第i层网络的编码网络的输入可以包括第i+1层网络的编码网络输出的、第i+1层节点的潜变量。That is to say, in the embodiment of the present application, the input of the coding network of the i-th layer network may include the latent variable of the i+1-th layer node output by the coding network of the i+1-th layer network.
相应的,在本申请的实施例中,第i层的编码网络在确定第i层节点的潜变量之后,也可以将第i层节点的潜变量输入至第i-1层网络的编码网络,即第i-1层网络的编码网络的输入可以包括第i层网络的编码网络输出的、第i层节点的潜变量。Correspondingly, in the embodiment of the present application, after determining the latent variables of the i-th layer node, the coding network of the i-th layer can also input the latent variables of the i-th layer node to the coding network of the i-1th layer network, That is, the input of the coding network of the i-1th layer network may include the latent variables of the i-th layer node output by the coding network of the i-th layer network.
也就是说,在本申请的实施例中,编码器可以将第i层节点的嵌入特征和第i+1层节点的潜变量输入至第i层网络的编码网络,从而可以确定第i层节点的潜变量。其中,第i层节点的嵌入特征是由第i层网络的第一嵌入单元输出的,第i+1层节点的潜变量是由第i+1层网络的编码网络输出的。That is to say, in the embodiment of the present application, the encoder can input the embedded features of the i-th layer node and the latent variables of the i+1-th layer node into the encoding network of the i-th layer network, so that the i-th layer node can be determined latent variables. Among them, the embedded features of the i-th layer node are output by the first embedding unit of the i-th layer network, and the latent variables of the i+1-th layer node are output by the encoding network of the i+1-th layer network.
示例性的,在本申请的实施例中,图6为编码网络的实现示意图,如图6所示,e (l)代表第一嵌入网络输出的第l层节点的嵌入特征,f (l)代表第l层节点的潜变量,f (l+1)代表第l+1层节点的潜变量,对于第l层节点,编码器可以探索f (l+1)与e (l)的空间相关性,并输出低层(第l层)的潜变量f (l)。例如,可以使用稀疏张量表示八叉树,其中,稀疏张量的坐标矩阵代表八叉树的节点坐标;稀疏张量的属性矩阵代表八叉树的节点占用码。编码网络可以采用稀疏卷积实现,其中,f (l)的下采样由一个卷积核大小为2,步长为2的稀疏卷积层,后接ReLU激活函数实现。此后,使用初始残差网络聚合兄弟节点特征,并输出至Concatenate网络中与e (l)进行特征融合,最后,使用一层稀疏卷积层融合e (l)与下采样后的f (l+1),得到低层节点的潜变量,即第l+1层节点的潜变量f (l)。 Exemplarily, in the embodiment of the present application, Figure 6 is a schematic diagram of the implementation of the coding network. As shown in Figure 6, e (l) represents the embedding feature of the l-th layer node output by the first embedding network, f (l) Represents the latent variable of the l-th layer node, f (l+1) represents the latent variable of the l+1-th layer node, for the l-th layer node, the encoder can explore the spatial correlation between f (l+1) and e (l) property, and output the latent variable f (l) of the lower layer (layer l). For example, a sparse tensor can be used to represent an octree, where the coordinate matrix of the sparse tensor represents the node coordinates of the octree; the attribute matrix of the sparse tensor represents the node occupancy code of the octree. The encoding network can be implemented using sparse convolution, where the downsampling of f (l) is implemented by a sparse convolution layer with a convolution kernel size of 2 and a stride of 2, followed by a ReLU activation function. After that, the initial residual network is used to aggregate the sibling node features and output to the Concatenate network for feature fusion with e (l) . Finally, a sparse convolution layer is used to fuse e (l) with the downsampled f (l+ 1) , obtain the latent variable of the low-level node, that is, the latent variable f (l) of the l+1th level node.
步骤103、根据第i层节点的潜变量确定第i层节点的残差的重建值,并根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值。Step 103: Determine the reconstructed value of the residual of the i-th node based on the latent variable of the i-th node, and determine the reconstructed value of the latent variable of the i-th node based on the reconstructed value of the residual of the i-th node.
在本申请的实施例中,编码器在将第i层节点的嵌入特征输入至编码网络,确定第i层节点的潜变量之后,可以进一步根据第i层节点的潜变量确定第i层节点的残差的重建值,进而可以根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值,同时,可以将第i层节点的残差的重建值写入码流。In the embodiment of the present application, after the encoder inputs the embedded features of the i-th layer node into the encoding network and determines the latent variables of the i-th layer node, it can further determine the i-th layer node's latent variables based on the latent variables of the i-th layer node. The reconstructed value of the residual can then determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node. At the same time, the reconstructed value of the residual of the i-th layer node can be written into the code stream.
需要说明的是,在本申请的实施例中,编码器在根据第i层节点的潜变量确定第i层节点的残差的重建值时,可以先根据第i层节点的潜变量的预测值和第i层节点的潜变量,确定第i层节点的残差;然后可以对第i层节点的残差进行量化处理,进而可以确定第i层节点的残差的重建值。It should be noted that in the embodiment of the present application, when the encoder determines the reconstructed value of the residual of the i-th layer node based on the latent variable of the i-th layer node, the encoder may first determine the reconstructed value of the residual of the i-th layer node based on the predicted value of the latent variable of the i-th layer node. and the latent variables of the i-th layer node to determine the residual of the i-th layer node; then the residual of the i-th layer node can be quantified, and then the reconstructed value of the residual of the i-th layer node can be determined.
可以理解的是,在本申请的实施例中,编码器可以将第i层节点的残差的重建值写入码流,传输至解码端。也就是说,编码器可以采用残差编码的方式分解并压缩潜变量。It can be understood that in the embodiment of the present application, the encoder can write the reconstructed value of the residual of the i-th layer node into the code stream and transmit it to the decoder. In other words, the encoder can decompose and compress the latent variables using residual coding.
进一步地,在本申请的实施例中,可以先进行第i层节点的潜变量的预测值的获取。其中,第i层节点的潜变量的预测值可以是由当前层的下一层网络的第一解码网络预测获得的,即可以是由第i-1层网络中的第一解码网络输出第i层节点的潜变量的预测值。Further, in the embodiment of the present application, the predicted value of the latent variable of the i-th layer node can be obtained first. Among them, the predicted value of the latent variable of the i-th layer node can be predicted by the first decoding network of the next layer network of the current layer, that is, it can be the i-th layer output by the first decoding network in the i-1 layer network. The predicted value of the latent variable of the layer node.
需要说明的是,在本申请的实施例中,编码器还可以包括减法网络,其中,减法网络可以用于进行潜变量的残差的确定。It should be noted that in the embodiment of the present application, the encoder may also include a subtraction network, where the subtraction network may be used to determine the residual of the latent variable.
也就是说,在本申请的实施例中,编码器所设置的多层网络中的至少一层网络还可以包括减法网络。其中,该减法网络为软减法网络,可以用于替换常见的硬性的减法处理。That is to say, in the embodiment of the present application, at least one layer of the multi-layer network set by the encoder may also include a subtraction network. Among them, the subtraction network is a soft subtraction network, which can be used to replace common hard subtraction processing.
示例性的,在本申请的实施例中,基于减法网络,可以通过以下公式实现残差的确定:For example, in the embodiment of the present application, based on the subtraction network, the residual can be determined through the following formula:
其中,g s代表拼接后通过一个稀疏卷积层的操作,r (l)代表第l层节点的残差,f (l)代表第l层节点的潜变量, 代表第l层节点的潜变量的预测值。 Among them, gs represents the operation of a sparse convolutional layer after concatenation, r (l) represents the residual of the l-th layer node, and f (l) represents the latent variable of the l-th layer node. Represents the predicted value of the latent variable of the l-th layer node.
示例性的,在本申请的实施例中,编码器在确定第i层节点的残差时,可以将第i层节点的潜变量的预测值和第i层节点的潜变量输入至减法网络,进而可以获得第i层节点的残差。For example, in the embodiment of the present application, when the encoder determines the residual of the i-th layer node, the predicted value of the latent variable of the i-th layer node and the latent variable of the i-th layer node can be input to the subtraction network, Then the residual of the i-th layer node can be obtained.
需要说明的是,在本申请的实施例中,减法网络可以包括稀疏卷积网络、特征融合网络Concatenate。It should be noted that in the embodiment of the present application, the subtraction network may include a sparse convolution network and a feature fusion network Concatenate.
示例性的,在本申请的实施例中,图7为减法网络的网络结构示意图,如图7所示,减法网络的网络结构可以为一个卷积核大小为3,步长为1的稀疏卷积层,后接Concatenate网络。Exemplarily, in the embodiment of the present application, Figure 7 is a schematic diagram of the network structure of the subtraction network. As shown in Figure 7, the network structure of the subtraction network can be a sparse convolution with a convolution kernel size of 3 and a step size of 1. Accumulated layer, followed by Concatenate network.
示例性的,在本申请的实施例中,图8为减法网络的实现示意图,如图8所示,f (l)代表第l层节点的潜变量, 代表第l层节点的潜变量的预测值,r (l)代表第l层节点的残差。其中,f (l)和 在经过Concatenate网络的融合后,可以通过稀疏卷积层的操作,最终输出第l层节点的残差r (l)。 Exemplarily, in the embodiment of the present application, Figure 8 is a schematic diagram of the implementation of the subtraction network. As shown in Figure 8, f (l) represents the latent variable of the l-th layer node, represents the predicted value of the latent variable of the l-th layer node, and r (l) represents the residual of the l-th layer node. Among them, f (l) and After the fusion of the Concatenate network, the residual r (l) of the l-th layer node can be finally output through the operation of the sparse convolution layer.
进一步地,在本申请的实施例中,编码器在对获得的第i层节点的残差进行量化处理,确定第i层节点的残差的重建值之后,可以进一步根据第i层节点的残差的重建值和第i层节点的潜变量的预测值,确定第i层节点的潜变量的重建值。其中,第i层节点的潜变量的预测值可以是由第i-1层中的第一解码网络输出的。Further, in the embodiment of the present application, after the encoder performs quantization processing on the obtained residual of the i-th layer node and determines the reconstruction value of the residual of the i-th layer node, it can further calculate the residual value of the i-th layer node based on the residual value of the i-th layer node. The difference between the reconstructed value and the predicted value of the latent variable of the i-th layer node determines the reconstructed value of the latent variable of the i-th layer node. Wherein, the predicted value of the latent variable of the i-th layer node may be output by the first decoding network in the i-1th layer.
需要说明的是,在本申请的实施例中,编码器还可以包括加法网络,其中,加法网络可以用于进行潜变量的重建值的确定。It should be noted that in the embodiment of the present application, the encoder may also include an additive network, where the additive network may be used to determine the reconstructed value of the latent variable.
也就是说,在本申请的实施例中,编码器所设置的多层网络中的至少一层网络还可以包括加法网络。其中,该加法网络为软加法网络,可以用于替换常见的硬性的加法处理。That is to say, in the embodiment of the present application, at least one layer of the multi-layer network set by the encoder may also include an additive network. Among them, the addition network is a soft addition network, which can be used to replace common hard addition processing.
示例性的,在本申请的实施例中,基于加法网络,可以通过以下公式实现潜变量的预测值的确定:Illustratively, in the embodiment of the present application, based on the additive network, the determination of the predicted value of the latent variable can be achieved through the following formula:
其中,g a代表拼接后通过一个稀疏卷积层的操作, 代表第l层节点的残差的重建值, 代表第l层节点的潜变量的重建值, 代表第l层节点的潜变量的预测值。 Among them, g a represents the operation of a sparse convolution layer after splicing, Represents the reconstructed value of the residual of the l-th layer node, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the predicted value of the latent variable of the l-th layer node.
示例性的,在本申请的实施例中,编码器在确定第i层节点的潜变量的重建值时,可以先将第i层节点的残差的重建值和第i层节点的潜变量的预测值输入至加法网络,进而可以获得第i层节点的潜变量的重建值。For example, in the embodiment of the present application, when the encoder determines the reconstructed value of the latent variable of the i-th layer node, the encoder may first combine the reconstructed value of the residual of the i-th layer node with the reconstructed value of the latent variable of the i-th layer node. The predicted value is input to the additive network, and the reconstructed value of the latent variable of the i-th layer node can be obtained.
需要说明的是,在本申请的实施例中,加法网络可以包括稀疏卷积网络、特征融合网络Concatenate。It should be noted that in the embodiment of the present application, the additive network may include a sparse convolutional network and a feature fusion network Concatenate.
示例性的,在本申请的实施例中,图9为加法网络的网络结构示意图,如图9所示,加法网络的网络结构可以为一个卷积核大小为3,步长为1的稀疏卷积层,后接Concatenate网络。Exemplarily, in the embodiment of the present application, Figure 9 is a schematic diagram of the network structure of the additive network. As shown in Figure 9, the network structure of the additive network can be a sparse convolution with a convolution kernel size of 3 and a step size of 1. Accumulated layer, followed by Concatenate network.
示例性的,在本申请的实施例中,图10为加法网络的实现示意图,如图10所示, 代表第l层节点的残差的重建值, 代表第l层节点的潜变量的重建值, 代表第l层节点的潜变量的预测值。其中, 和 在经过Concatenate网络的融合后,可以通过稀疏卷积层的操作,最终输出第l层节点的潜变量的重建值 Exemplarily, in the embodiment of the present application, Figure 10 is a schematic diagram of the implementation of the addition network, as shown in Figure 10, Represents the reconstructed value of the residual of the l-th layer node, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the predicted value of the latent variable of the l-th layer node. in, and After the fusion of the Concatenate network, the reconstructed value of the latent variable of the l-th layer node can be finally output through the operation of the sparse convolution layer.
可以理解的是,在本申请的实施例中,使用软加/减网络(加法网络和减法网络)代替原有的硬性加减,大大提升了网络灵活度。It can be understood that in the embodiment of the present application, soft addition/subtraction networks (addition network and subtraction network) are used instead of the original hard addition and subtraction, which greatly improves the network flexibility.
步骤104、将第i层节点的潜变量的重建值输入至第一解码网络,生成第i层节点的几何信息的码流。Step 104: Input the reconstructed value of the latent variable of the i-th layer node to the first decoding network to generate a code stream of the geometric information of the i-th layer node.
在本申请的实施例中,编码器在根据第i层节点的潜变量确定第i层节点的潜变量的重建值之后,可以进一步将第i层节点的潜变量的重建值输入至第一解码网络,从而可以生成第i层节点的几何信息的码流。In an embodiment of the present application, after the encoder determines the reconstructed value of the latent variable of the i-th layer node based on the latent variable of the i-th layer node, the encoder may further input the reconstructed value of the latent variable of the i-th layer node to the first decoder. network, so that the code stream of the geometric information of the i-th layer node can be generated.
需要说明的是,在本申请的实施例中,在通过第一解码网络生成第i层节点的几何信息的码流之前,第i层网络至第一层网络中的第一嵌入网络已经完成了相应的嵌入特征的输出,第i层网络至第一层网络中的编码网络已经完成了相应的潜变量的输出,同时,第i-1层网络至第一层网络已经完成了相应的潜变量的重建值的输出。It should be noted that in the embodiment of the present application, before the code stream of the geometric information of the i-th layer node is generated through the first decoding network, the first embedding network in the i-th layer network to the first layer network has been completed. For the corresponding output of embedded features, the encoding network in the i-th layer network to the first layer network has completed the output of the corresponding latent variables. At the same time, the i-1th layer network to the first layer network has completed the corresponding latent variable output. The output of the reconstructed value.
进一步地,在本申请的实施例中,第i层网络的第一解码网络的输入可以包括:第i-2层网络的第一嵌入网络输出的第i-2层节点的嵌入特征,第i-1层网络的第一嵌入网络输出的第i-1层节点的嵌入特征,第i-1层网络获得的第i-1层节点的潜变量的重建值;第i层网络的第一解码网络的输入还可以包括第i层节点的潜变量的重建值。Further, in the embodiment of the present application, the input of the first decoding network of the i-th layer network may include: the embedded features of the i-2th layer node output by the first embedding network of the i-2th layer network, the i-th layer node The embedding feature of the i-1th layer node output by the first embedding network of the -1 layer network, the reconstructed value of the latent variable of the i-1th layer node obtained by the i-1th layer network; the first decoding of the i-th layer network The input of the network can also include the reconstructed value of the latent variable of the i-th layer node.
需要说明的是,在本申请的实施例中,编码器在根据第i层节点的潜变量的重建值进行第i层节点的几何信息的码流的生成时,可以先将第i层节点的潜变量的重建值、第i-2层节点的嵌入特征、第i-1层节点的嵌入特征、第i-1层节点的潜变量的重建值输入至第i层网络的第一解码网络,从而可以生成第i层节点的几何信息的码流。It should be noted that, in an embodiment of the present application, when the encoder generates a code stream of geometric information of the i-th layer nodes according to the reconstructed values of the latent variables of the i-th layer nodes, the reconstructed values of the latent variables of the i-th layer nodes, the embedded features of the i-2 layer nodes, the embedded features of the i-1 layer nodes, and the reconstructed values of the latent variables of the i-1 layer nodes can be first input into the first decoding network of the i-th layer network, thereby generating a code stream of geometric information of the i-th layer nodes.
可以理解的是,在本申请的实施例中,基于第一解码网络,可以先确定第i层节点的概率参数;然后便可以根据第i层的概率参数进一步生成第i层节点的几何信息的码流。It can be understood that in the embodiment of the present application, based on the first decoding network, the probability parameters of the i-th layer node can be determined first; and then the geometric information of the i-th layer node can be further generated based on the probability parameters of the i-th layer. code stream.
也就是说,在本申请的实施例中,可以先进行第i-2层节点的嵌入特征、第i-1层节点的嵌入特征、第i-1层节点的潜变量的重建值的获取。其中,第i-2层节点的嵌入特征可以是由当前层的下两层网络的第一嵌入网络获得的,即可以是由第i-2层网络中的第一嵌入网络输出第i-2层节点的嵌入特征;第i-1层节点的嵌入特征可以是由当前层的下一层网络的第一嵌入网络获得的,即可以是由第i-1层网络中的第一嵌入网络输出第i-1层节点的嵌入特征;第i-1层节点的潜变量的重建值可以是由当前层的下一层的网络获得的。That is to say, in the embodiment of the present application, the embedding features of the i-2th layer node, the embedding features of the i-1th layer node, and the reconstructed values of the latent variables of the i-1th layer node can be obtained first. Among them, the embedded features of the i-2th layer node can be obtained from the first embedding network of the two lower layers of the current layer, that is, the i-2th layer node can be output by the first embedding network in the i-2th layer network. The embedded features of the layer node; the embedded features of the i-1th layer node can be obtained by the first embedding network of the next layer network of the current layer, that is, it can be output by the first embedding network in the i-1th layer network. The embedded features of the i-1th layer node; the reconstructed value of the latent variable of the i-1th layer node can be obtained from the network of the next layer of the current layer.
需要说明的是,在本申请的实施例中,第一解码网络可以包括稀疏卷积网络、反卷积层、初始残差网络、特征融合网络、ReLU激活函数。It should be noted that in the embodiment of the present application, the first decoding network may include a sparse convolution network, a deconvolution layer, an initial residual network, a feature fusion network, and a ReLU activation function.
示例性的,在本申请的实施例中,图11为解码网络的网络结构示意图一,如图11所示,第一解码网络的网络结构可以为在Concatenate网络后接两层卷积核大小为3,步长为1的稀疏卷积层,后接一个自编码器,如Binary AE。Exemplarily, in the embodiment of the present application, Figure 11 is a schematic diagram of the network structure of the decoding network. As shown in Figure 11, the network structure of the first decoding network can be a Concatenate network followed by two layers of convolution kernels with a size of 3. A sparse convolution layer with a stride of 1, followed by an autoencoder, such as Binary AE.
示例性的,在本申请的实施例中,图12为解码网络的实现示意图一,如图12所示, 代表第l层节点的潜变量的重建值, 代表第l-1层节点的潜变量的重建值,e (l-1)代表第一嵌入网络输出的第l-1层节点的嵌入特征,e (l-2)代表第一嵌入网络输出的第l-2层节点的嵌入特征。在获得 e (l-1),e (l-2)之后,可以在通过Concatenate网络对 e (l-1),e (l-2)进行拼接之后,通过一个二层的稀疏卷积网络,预测出第l层各节点的概率p (l),最后通过算术编码器,如Binary AE,熵编码生成x (l)对应的二进制码流。 Exemplarily, in the embodiment of the present application, Figure 12 is a schematic diagram 1 of the implementation of the decoding network. As shown in Figure 12, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the reconstructed value of the latent variable of the l-1th layer node, e (l-1) represents the embedding feature of the l-1th layer node output by the first embedding network, and e (l-2) represents the embedding feature output by the first embedding network. Embedded features of nodes in layer l-2. in getting e (l-1) , e (l-2) , you can use the Concatenate network to After e (l-1) and e (l-2) are spliced, the probability p (l) of each node in the l-th layer is predicted through a two-layer sparse convolution network, and finally through an arithmetic encoder, such as Binary AE , entropy coding generates the binary code stream corresponding to x (l) .
进一步地,在本申请的实施例中,图13为本申请实施例提出的编码方法的实现流程示意图二,如图13所示,在根据第i层节点的潜变量确定第i层节点的潜变量的重建值之后,即步骤103之后,编码器进行编码的方法还可以包括以下步骤:Further, in the embodiment of the present application, Figure 13 is a schematic diagram 2 of the implementation flow of the encoding method proposed in the embodiment of the present application. As shown in Figure 13, the latent variable of the i-th layer node is determined according to the latent variable of the i-th layer node. After the reconstructed value of the variable, that is, after
步骤105、将第i层节点的潜变量的重建值输入至第一解码网络,确定第i+1层节点的潜变量的预测值。Step 105: Input the reconstructed value of the latent variable of the i-th layer node to the first decoding network, and determine the predicted value of the latent variable of the i+1-th layer node.
在本申请的实施例中,编码器在根据第i层节点的潜变量确定第i层节点的潜变量的重建值之后,可以进一步将第i层节点的潜变量的重建值输入至第一解码网络,从而可以确定第i+1层节点的潜变量的预测值。In an embodiment of the present application, after the encoder determines the reconstructed value of the latent variable of the i-th layer node based on the latent variable of the i-th layer node, the encoder may further input the reconstructed value of the latent variable of the i-th layer node to the first decoder. network, so that the predicted value of the latent variable of the i+1th layer node can be determined.
需要说明的是,在本申请的实施例中,在通过第一解码网络生成第i+1层节点的潜变 量的预测值之前,第i层网络至第一层网络中的第一嵌入网络已经完成了相应的嵌入特征的输出,第i层网络至第一层网络中的编码网络已经完成了相应的潜变量的输出,同时,第i-1层网络至第一层网络已经完成了相应的潜变量的重建值的输出。It should be noted that in the embodiment of the present application, before the predicted value of the latent variable of the i+1-th layer node is generated by the first decoding network, the first embedding network in the i-th layer network to the first layer network has already The output of the corresponding embedded features has been completed. The coding network from the i-th layer network to the first layer network has completed the output of the corresponding latent variables. At the same time, the i-1 layer network to the first layer network has completed the corresponding output. Output of the reconstructed values of the latent variable.
进一步地,在本申请的实施例中,第i层网络的第一解码网络的输入可以包括:第i-1层网络的第一嵌入网络输出的第i-1层节点的嵌入特征,第i-1层网络获得的第i-1层节点的潜变量的重建值;第i层网络的第一解码网络的输入还可以包括第i层节点的潜变量的重建值和第i嵌入特征。Further, in the embodiment of the present application, the input of the first decoding network of the i-th layer network may include: the embedded features of the i-1th layer node output by the first embedding network of the i-1th layer network, the i-th layer node The reconstructed value of the latent variable of the i-1th layer node obtained by the -1 layer network; the input of the first decoding network of the i-th layer network may also include the reconstructed value of the latent variable of the i-th layer node and the i-th embedded feature.
需要说明的是,在本申请的实施例中,编码器在根据第i层节点的潜变量的重建值确定第i+1层节点的潜变量的预测值时,可以将第i层节点的潜变量的重建值、第i层节点的嵌入特征、第i-1层节点的嵌入特征、第i-1层节点的潜变量的重建值输入至第i层网络的第一解码网络,从而可以确定第i+1层节点的潜变量的预测值。It should be noted that in the embodiment of the present application, when the encoder determines the predicted value of the latent variable of the i+1-th layer node based on the reconstructed value of the latent variable of the i-th layer node, the encoder may convert the latent variable of the i-th layer node into The reconstructed value of the variable, the embedded feature of the i-th layer node, the embedded feature of the i-1th layer node, and the reconstructed value of the latent variable of the i-1th layer node are input to the first decoding network of the i-th layer network, so that it can be determined The predicted value of the latent variable of the i+1th layer node.
也就是说,在本申请的实施例中,可以先进行述第i-1层节点的嵌入特征、第i-1层节点的潜变量的重建值的获取。其中,第i-1层节点的嵌入特征可以是由当前层的下一层网络的第一嵌入网络获得的,即可以是由第i-1层网络中的第一嵌入网络输出第i-1层节点的嵌入特征;第i-1层节点的潜变量的重建值可以是由当前层的下一层的网络获得的。That is to say, in the embodiment of the present application, the embedding features of the i-1th layer node and the reconstructed values of the latent variables of the i-1th layer node can be obtained first. Among them, the embedded features of the i-1th layer node can be obtained by the first embedding network of the next layer network of the current layer, that is, the i-1th layer node can be output by the first embedding network in the i-1th layer network. Embedding features of layer nodes; the reconstructed value of the latent variable of the i-1th layer node can be obtained from the network of the next layer of the current layer.
需要说明的是,在本申请的实施例中,第一解码网络可以包括稀疏卷积网络、反卷积层、初始残差网络、特征融合网络、ReLU激活函数。It should be noted that in the embodiment of the present application, the first decoding network may include a sparse convolution network, a deconvolution layer, an initial residual network, a feature fusion network, and a ReLU activation function.
示例性的,在本申请的实施例中,图14为解码网络的网络结构示意图二,如图14所示,第一解码网络的网络结构可以为在Concatenate网络后接一层卷积核大小为2,步长为2的稀疏卷积层,后接三个初始残差网络,后接一层卷积核大小为3,步长为1的稀疏卷积层。Exemplarily, in the embodiment of the present application, Figure 14 is a schematic diagram 2 of the network structure of the decoding network. As shown in Figure 14, the network structure of the first decoding network can be a Concatenate network followed by a layer of convolution kernel size. 2. A sparse convolution layer with a stride of 2, followed by three initial residual networks, followed by a sparse convolution layer with a convolution kernel size of 3 and a stride of 1.
示例性的,在本申请的实施例中,图15为解码网络的实现示意图二,如图15所示, 代表第l层节点的潜变量的重建值, 代表第l-1层节点的潜变量的重建值,e (l)代表第一嵌入网络输出的第l层节点的嵌入特征,e (l-1)代表第一嵌入网络输出的第l-1层节点的嵌入特征。在获得 e (l),e (l-1)之后,可以在通过Concatenate网络对 e (l-1),e (l-2)进行拼接之后,然后通过一个卷积核大小为2,步长为2的反卷积层,再通过初始残差网络得到第l+1层节点的潜变量的预测值 Exemplarily, in the embodiment of the present application, Figure 15 is a schematic diagram 2 of the implementation of the decoding network. As shown in Figure 15, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the reconstructed value of the latent variable of the l-1th layer node, e (l) represents the embedding feature of the l-th layer node output by the first embedding network, e (l-1) represents the l-1th layer output by the first embedding network Embedding features of layer nodes. in getting e (l) , e (l-1) , you can use the Concatenate network to After e (l-1) and e (l-2) are spliced, a deconvolution layer with a convolution kernel size of 2 and a step size of 2 is passed, and then the l+1th layer node is obtained through the initial residual network The predicted value of the latent variable
由此可见,在本申请的实施例中,对于第一解码网络而言,可以基于不同的输入,分别输出两个不同的分支,一个分支为八叉树第i层节点的几何信息对应的码流,另一分支为八叉树第i+1层节点的潜变量的预测值。It can be seen that in the embodiment of the present application, for the first decoding network, two different branches can be output based on different inputs. One branch is the code corresponding to the geometric information of the i-th layer node of the octree. flow, and the other branch is the predicted value of the latent variable of the i+1th layer node of the octree.
可以理解的是,在本申请的实施例中,八叉树第i层节点的几何信息对应的码流可以传输至解码端,而八叉树第i+1层节点的潜变量的预测值则可以输入至第i+1层网络中,用于进行第i+1层节点的残差和第i+1层节点的潜变量的重建值的确定。It can be understood that in the embodiment of the present application, the code stream corresponding to the geometric information of the i-th layer node of the octree can be transmitted to the decoding end, and the predicted value of the latent variable of the i+1-th layer node of the octree is It can be input into the i+1th layer network and used to determine the residual of the i+1th layer node and the reconstruction value of the latent variable of the i+1th layer node.
进一步,在本申请的实施例中,对于第一层,编码器可以先对第一层网络的编码单元所输出的第一层节点的潜变量进行量化处理,确定第一层节点的潜变量的重建值;然后可以将第一层节点的潜变量的重建值和预设嵌入特征输出至第一层网络的第一解码单元,进而可以分别输出第一层节点的几何信息的码流和第二层节点的潜变量的预测值。Further, in the embodiment of the present application, for the first layer, the encoder can first perform quantization processing on the latent variables of the first layer nodes output by the coding unit of the first layer network, and determine the latent variables of the first layer nodes. The reconstructed value; then the reconstructed value of the latent variable and the preset embedding feature of the first layer node can be output to the first decoding unit of the first layer network, and then the code stream of the geometric information of the first layer node and the second The predicted value of the latent variable of the layer node.
示例性的,在本申请的实施例中,第一层可以为八叉树中的、全部叶子节点均为空的层级。也就是说,在对点云进行八叉树的构建时,无法继续划分的八叉树的最后一层级即为第一层。For example, in the embodiment of the present application, the first level may be a level in an octree in which all leaf nodes are empty. In other words, when constructing an octree for a point cloud, the last level of the octree that cannot be further divided is the first level.
示例性的,在本申请的实施例中,第一层也可以为八叉树中的、被划分到最小单位的层级,如划分到1x1x1的最小单位块。也就是说,在对点云进行八叉树的构建时,划分到最小单位块的八叉树的最后一层级即为第一层。For example, in the embodiment of the present application, the first level may also be a level in an octree that is divided into minimum units, such as divided into minimum unit blocks of 1x1x1. In other words, when constructing an octree for a point cloud, the last level of the octree divided into the smallest unit blocks is the first level.
进一步地,在本申请的实施例中,编码器可以采用因式变分自编码器(factorized variational auto-encoder)风格的熵编码器,来进行残差r (l)的压缩。 Further, in the embodiment of the present application, the encoder may use a factorized variational auto-encoder style entropy encoder to compress the residual r (l) .
示例性的,在本申请的实施例中,对于待压缩的变量y,由于量化过程不可导,因此在训练阶段使用均匀噪声U(-0.5,0.5)代替量化。记量化后的变量为 则如下公式: For example, in the embodiment of the present application, for the variable y to be compressed, since the quantization process is not differentiable, uniform noise U (-0.5, 0.5) is used instead of quantization in the training stage. The quantified variable is recorded as Then the following formula:
其中,定义p(x)为x的概率分布,概率假设 各维度的独立性,则有如下公式: Among them, p(x) is defined as the probability distribution of x, and the probability assumption The independence of each dimension has the following formula:
其中,使用神经网络所拟合累计分布c(y),基于 c(y)应具有如下性质: Among them, the cumulative distribution c(y) fitted by the neural network is based on c(y) should have the following properties:
·y∈(-∞,∞)·y∈(-∞,∞)
·c(y)递增·c(y)increments
·c(-∞)=0,c(∞)=1·c(-∞)=0,c(∞)=1
因此使用如下神经网络拟合c(y):Therefore, the following neural network is used to fit c(y):
f k(x)=g k(H (k)x+b (k)),1≤k≤K f k (x)=g k (H (k) x+b (k) ),1≤k≤K
f K(x)=sigmoid(H (K)x+b (K)) f K (x)=sigmoid(H (K) x+b (K) )
由p(y)计算 的方法如下公式: Calculated from p(y) The method is as follows:
得到 概率分布后,使用算术编码器对 进行熵编解码。 get After the probability distribution, use the arithmetic encoder to Perform entropy encoding and decoding.
进一步地,在本申请的实施例中,在编码端,最终输出的码流可以分别由多层网络中的至少一层网络所对应输出的码流构成。其中,对于其中的一层网络而言,如第l层网络,输出的码流包括通过因式变分自编码器产生的第l层节点的 的码流,通过算术编码器产生的第l层节点的几何信息x (l)的码流,结构信息。 Further, in the embodiment of the present application, at the encoding end, the final output code stream may be composed of the code stream corresponding to the output of at least one layer of the multi-layer network. Among them, for one layer of the network, such as the l-th layer network, the output code stream includes the l-th layer node generated by the factorial variation autoencoder. The code stream is the code stream of the geometric information x (l) of the l-th layer node generated by the arithmetic encoder, and the structural information.
综上所述,上述步骤101至步骤105所提出的编码方法,可以依赖八叉树结构,使用构建的嵌入网络、编码网络以及解码网络,引入潜变量来捕捉八叉树中同一层中的兄弟节点之间的相关性,能够获得更高的压缩性能;同时,以八叉树的层级为单位进行并行编解码处理,可以有效缩短编解码时间,大大提升编解码效率,进一步提升压缩性能。To sum up, the encoding method proposed in
本申请实施例提供了一种编码方法,在编码端,编码器包括第一嵌入网络、编码网络、第一解码网络,编码器将点云依八叉树划分得到的第i层节点的几何信息输入至第一嵌入网络,确定第i层节点的嵌入特征;其中,i大于2;将第i层节点的嵌入特征输入至编码网络,确定第i层节点的潜变量;根据第i层节点的潜变量确定第i层节点的残差的重建值,并根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值;将第i层节点的潜变量的重建值输入至第一解码网络,生成第i层节点的几何信息的码流。由此可见,在本申请的实施例中,在对点云进行编解码时,基于构建的八叉树结构,可以使用潜变量来表征八叉树中的统一层级的节点之间的空间相关性,最终能够获得更高的压缩性能;同时,基于构建的网络对八叉树的相同层级的节点执行并行编解码处理,实现了编解码效率的提高,进一步提升了压缩性能。The embodiment of the present application provides an encoding method. At the encoding end, the encoder includes a first embedding network, a coding network, and a first decoding network. The encoder divides the point cloud according to the octree to obtain the geometric information of the i-th layer node. Input to the first embedding network to determine the embedding features of the i-th layer node; where i is greater than 2; input the embedding features of the i-th layer node to the encoding network to determine the latent variables of the i-th layer node; according to the i-th layer node The latent variable determines the reconstructed value of the residual of the i-th layer node, and determines the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node To the first decoding network, a code stream of geometric information of the i-th layer node is generated. It can be seen that in the embodiments of the present application, when encoding and decoding point clouds, based on the constructed octree structure, latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
基于上述实施例,本申请实施例提出的编码方法,可以理解为一种端到端的LIDAR点云深度熵模型,也可以理解为一种基于深度学习的端到端自监督动态点云压缩技术。其中,依赖于八叉树结构,引入潜变量捕捉八叉树兄弟节点相关性,利用稀疏卷积构建网络,因此取得了LIDAR点云无损熵模型中的最优效果。Based on the above embodiments, the encoding method proposed in the embodiment of this application can be understood as an end-to-end LIDAR point cloud depth entropy model, or as an end-to-end self-supervised dynamic point cloud compression technology based on deep learning. Among them, it relies on the octree structure, introduces latent variables to capture the correlation of octree sibling nodes, and uses sparse convolution to construct the network, thus achieving the optimal effect in the LIDAR point cloud lossless entropy model.
同时,本申请实施例提出的编码方法能够提升LIDAR点云的编解码效率。其中,本申请的实施例具有较低的编解码时间,这是由于本申请的实施例各节点的编解码都是以层为单位并行处理的,且利用高效的稀疏卷积构建网络,因此具有相较于常见的编解码方案低得多的编解码时间。At the same time, the encoding method proposed in the embodiment of this application can improve the encoding and decoding efficiency of LIDAR point cloud. Among them, the embodiments of the present application have a lower encoding and decoding time. This is because the encoding and decoding of each node in the embodiments of the present application are processed in parallel on a layer basis, and efficient sparse convolution is used to construct the network. Therefore, it has Much lower encoding and decoding time compared to common encoding and decoding schemes.
需要说明的是,本申请的实施例提出的执行编码方法的网络,能够充分利用层级潜变 量捕捉LIDAR点云相关性,同时可以采用残差编码的方式分解并压缩潜变量,进一步地,还可以使用软加/减网络以代替原有的硬性加减,提升了网络灵活度。It should be noted that the network that performs the coding method proposed in the embodiment of the present application can make full use of hierarchical latent variables to capture the LIDAR point cloud correlation, and can also use residual coding to decompose and compress the latent variables. Furthermore, it can also The soft addition/subtraction network is used to replace the original hard addition/subtraction, which improves the flexibility of the network.
进一步地,在申请的实施例中,图16为执行编码方法的整体框架示意图,如图16所示,编码器可以设置有多层网络,其中,多层网络中的至少一层网络可以包括第一嵌入网络(embedding)、编码网络(encoder)、第一解码网络(decoder)。例如,每一层均由编码网络(encoder),解码网络(decoder),嵌入网络(embedding)组成。Further, in the embodiment of the application, Figure 16 is a schematic diagram of the overall framework for executing the encoding method. As shown in Figure 16, the encoder can be provided with a multi-layer network, wherein at least one layer of the multi-layer network can include a third layer. An embedding network (embedding), encoding network (encoder), and first decoding network (decoder). For example, each layer consists of an encoding network (encoder), a decoding network (decoder), and an embedding network (embedding).
可以理解的是,在本申请的实施例中,多层网络中的第l层网络可以对第l层节点的几何信息进行并行编码处理。It can be understood that, in the embodiments of the present application, the l-th layer network in the multi-layer network can perform parallel encoding processing on the geometric information of the l-th layer nodes.
其中,如图所示,x (l)代表第l层节点的几何信息,e (l)代表占用嵌入网络后的第l层节点的嵌入特征,f (l)代表第l层节点的潜变量, 代表第l层残差的重建值, 代表第l层节点的潜变量的重建值, 代表第l层节点的潜变量的预测值。图中+/-符号代表软加/减网络(加法网络/减法网络),Q代表量化操作。 Among them, as shown in the figure, x (l) represents the geometric information of the l-th layer node, e (l) represents the embedded feature of the l-th layer node after occupying the embedded network, and f (l) represents the latent variable of the l-th layer node. , Represents the reconstructed value of the l-th layer residual, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the predicted value of the latent variable of the l-th layer node. The +/- symbols in the figure represent soft addition/subtraction networks (addition networks/subtraction networks), and Q represents quantization operations.
示例性的,在本申请的实施例中,如图所示,编码器中设置的网络可以由5层网络构成。其中,对于第l层网络,第l层节点的几何信息x (l)通过嵌入网络,得到占用嵌入e (l)(嵌入特征)。编码网络捕获e (l)与上一层(第l+1层)的潜变量f (l+1)的空间相关性,并输出当前层节点的潜变量f (l)。解码网络分为两路,分别生成八叉树的当前层x (l)的码流,以及上一层节点的潜变量的预测值 其中,第l层节点的潜变量的预测值 与真实值f (l)间的第l层节点的残差为r (l),经过量化后为第l层节点的残差的重建值 由因式变分自编码器无损熵编码。 For example, in the embodiment of the present application, as shown in the figure, the network provided in the encoder may be composed of a 5-layer network. Among them, for the l-th layer network, the geometric information x (l) of the l-th layer node passes through the embedding network to obtain the occupancy embedding e (l) (embedded feature). The encoding network captures the spatial correlation between e (l) and the latent variable f (l+1) of the previous layer (layer l+1), and outputs the latent variable f (l) of the current layer node. The decoding network is divided into two paths, which generate the code stream of the current layer x (l) of the octree and the predicted value of the latent variable of the node in the previous layer. Among them, the predicted value of the latent variable of the l-th layer node The residual of the l-th layer node between the real value f (l) is r (l) . After quantization, it is the reconstructed value of the residual of the l-th layer node. Lossless entropy coding by factorial variational autoencoders.
需要说明的是,在本申请的实施例中,对于第一层(如图所示的第l-5层),编码器可以先对第一层网络的编码单元所输出的第一层节点的潜变量f (l-5)进行量化处理,确定第一层节点的潜变量的重建值 然后可以将第一层节点的潜变量的重建值和预设嵌入特征e (l-6)输出至第一层网络的第一解码单元,进而可以分别输出第一层节点的几何信息的码流和第二层节点的潜变量的预测值 It should be noted that in the embodiment of the present application, for the first layer (layer 1-5 as shown in the figure), the encoder may first perform The latent variable f (l-5) is quantified to determine the reconstructed value of the latent variable of the first layer node. Then the reconstructed value of the latent variable of the first layer node and the preset embedding feature e (l-6) can be output to the first decoding unit of the first layer network, and then the code stream of the geometric information of the first layer node can be output respectively. and the predicted value of the latent variable of the second-level node
进一步地,在本申请的实施例中,如图所示,第l-1层网络的编码网络的输入可以包括第l层网络的编码网络输出的、第l层节点的潜变量。也就是说,在本申请的实施例中,编码器可以将第l-1层节点的嵌入特征和第l层节点的潜变量输入至第l-1层网络的编码网络,从而可以确定第l-1层节点的潜变量。其中,第l-1层节点的嵌入特征是由第l-1层网络的第一嵌入单元输出的,第l层节点的潜变量是由第l层网络的编码网络输出的。Further, in the embodiment of the present application, as shown in the figure, the input of the encoding network of the l-1th layer network may include the latent variables of the l-th layer nodes output by the encoding network of the l-th layer network. That is to say, in the embodiment of the present application, the encoder can input the embedded features of the l-1th layer node and the latent variable of the l-th layer node into the encoding network of the l-1th layer network, so that the l-th layer node can be determined -Latent variables of
进一步地,在本申请的实施例中,如图所示,第l-1层网络的第一解码网络的输入可以包括:第l-3层网络的第一嵌入网络输出的第l-3层节点的嵌入特征,第l-2层网络的第一嵌入网络输出的第l-2层节点的嵌入特征,第l-2层网络获得的第l-2层节点的潜变量的重建值;第l-1层网络的第一解码网络的输入还可以包括第l-1层节点的潜变量的重建值。相应的,编码器在进行第l-1层节点的几何信息的码流的生成时,可以先将第l-1层节点的潜变量的重建值、第l-3层节点的嵌入特征、第l-2层节点的嵌入特征、第l-2层节点的潜变量的重建值输入至第l-1层网络的第一解码网络,从而可以生成第l-1层节点的几何信息的码流。Further, in the embodiment of the present application, as shown in the figure, the input of the first decoding network of the l-1th layer network may include: the l-3th layer output of the first embedding network of the l-3th layer network The embedding features of the node, the embedding features of the l-2th layer node output by the first embedding network of the l-2th layer network, the reconstructed value of the latent variable of the l-2th layer node obtained by the l-2th layer network; The input of the first decoding network of the l-1 layer network may also include the reconstructed value of the latent variable of the l-1 layer node. Correspondingly, when the encoder generates the code stream of the geometric information of the l-1th layer node, it can first combine the reconstructed value of the latent variable of the l-1th layer node, the embedded feature of the l-3th layer node, the The embedded features of the l-2 layer nodes and the reconstructed values of the latent variables of the l-2 layer nodes are input to the first decoding network of the l-1 layer network, thereby generating a code stream of the geometric information of the l-1 layer nodes. .
需要说明的是,在本申请的实施例中,编码器在进行第l层节点的潜变量的预测值的确定时,可以将第l-1层节点的潜变量的重建值、第l-1层节点的嵌入特征、第l-2层节点的嵌入特征、第l-2层节点的潜变量的重建值输入至第l-1层网络的第一解码网络,从而可以确定第l层节点的潜变量的预测值。It should be noted that in the embodiment of the present application, when the encoder determines the predicted value of the latent variable of the l-th layer node, the reconstructed value of the latent variable of the l-1th layer node, the l-1th layer node The embedded features of the layer node, the embedded features of the l-2th layer node, and the reconstructed value of the latent variable of the l-2th layer node are input to the first decoding network of the l-1th layer network, so that the l-th layer node can be determined. Predictive value of the latent variable.
进一步地,在本申请的实施例中,如图所示,在第l-1层网络的第一解码网络输出第l层节点的潜变量的预测值之后,可以在后续的第l层的编码处理过程中,使用第l层节点的潜变量的预测值,利用加法网络和减法网络,分别进行第l层节点的残差确定和第l层节点的潜变量的重建值的确定。Further, in the embodiment of the present application, as shown in the figure, after the first decoding network of the l-1 layer network outputs the predicted value of the latent variable of the l-th layer node, the subsequent encoding of the l-th layer can be During the processing, the predicted value of the latent variable of the l-th layer node is used, and the addition network and the subtraction network are used to determine the residual of the l-th layer node and the reconstruction value of the latent variable of the l-th layer node, respectively.
示例性的,在本申请的实施例中,编码网络(encoder)可以用于探索e (l)与上一层(第l+1层)的潜变量f (l+1)的空间相关性,并输出当前层节点的潜变量f (l)。例如,可以使用稀疏张量表示八叉树,其中,稀疏张量的坐标矩阵代表八叉树的节点坐标;稀疏张量的属性矩阵代表八叉树的节点占用码。编码网络可以采用稀疏卷积实现,其中,f (l)的下采样由一个卷积核大小为2,步长为2的稀疏卷积层,后接ReLU激活函数实现。此后,使用初始残差网络聚合兄弟节点特征,并输出至Concatenate网络中与e (l)进行特征融合,最后,使用一层稀疏卷积层融合e (l)与下采样后的f (l+1),得到低层节点的潜变量,即第l+1层节点的潜变量f (l)。 Illustratively, in the embodiment of the present application, the encoding network (encoder) can be used to explore the spatial correlation between e (l) and the latent variable f (l+1 ) of the previous layer (l+1th layer), And output the latent variable f (l) of the current layer node. For example, a sparse tensor can be used to represent an octree, where the coordinate matrix of the sparse tensor represents the node coordinates of the octree; the attribute matrix of the sparse tensor represents the node occupancy code of the octree. The encoding network can be implemented using sparse convolution, where the downsampling of f (l) is implemented by a sparse convolution layer with a convolution kernel size of 2 and a stride of 2, followed by a ReLU activation function. After that, the initial residual network is used to aggregate the sibling node features and output to the Concatenate network for feature fusion with e (l) . Finally, a sparse convolution layer is used to fuse e (l) with the downsampled f (l+ 1) , obtain the latent variable of the low-level node, that is, the latent variable f (l) of the l+1th level node.
示例性的,在本申请的实施例中,解码网络(decoder)分别用于生成八叉树的当前层x (l)的码流,以及上一层节点的潜变量的预测值 其中,一个分支包括:在获得 e (l-1),e (l-2)之后,可以在通过Concatenate网络对 e (l-1),e (l-2)进行拼接之后,通过一个二层的稀疏卷积网络,预测出第l层各节点的概率p (l),最后通过算术编码器,熵编码生成x (l)对应的二进制码流。另一个分支包括:在获得 e (l),e (l-1)之后,可以在通过Concatenate网络对 e (l-1),e (l-2)进行拼接之后,然后通过一个卷积核大小为2,步长为2的反卷积层,再通过初始残差网络得到第l+1层节点的潜变量的预测值 Illustratively, in the embodiment of the present application, the decoder network (decoder) is used to generate the code stream of the current layer x (l) of the octree and the predicted value of the latent variable of the node in the previous layer. Among them, one branch includes: after obtaining e (l-1) , e (l-2) , you can use the Concatenate network to After splicing e (l-1) and e (l-2) , the probability p (l) of each node in the l-th layer is predicted through a two-layer sparse convolution network, and finally generated by entropy coding through an arithmetic encoder The binary code stream corresponding to x (l) . Another branch includes: after obtaining e (l) , e (l-1) , you can use the Concatenate network to After e (l-1) and e (l-2) are spliced, a deconvolution layer with a convolution kernel size of 2 and a step size of 2 is passed, and then the l+1th layer node is obtained through the initial residual network The predicted value of the latent variable
示例性的,在本申请的实施例中,嵌入网络(embedding)可以用于将八叉树节点从256维的one-hot向量映射至16维的连续的特征空间,使得在空间中相近的占用情况在特征空间中也相近。For example, in the embodiment of the present application, the embedding network (embedding) can be used to map the octree nodes from the 256-dimensional one-hot vector to the 16-dimensional continuous feature space, so that they occupy similar spaces in the space. The situation is similar in feature space.
示例性的,在本申请的实施例中,编码器所设置的多层网络中的至少一层网络还可以包括减法网络和加法网络。其中,使用软加/减网络(加法网络和减法网络)代替原有的硬性加减,大大提升了网络灵活度。Exemplarily, in an embodiment of the present application, at least one layer of the multi-layer network set by the encoder may also include a subtraction network and an addition network. Among them, the use of soft addition/subtraction networks (addition networks and subtraction networks) instead of the original hard addition and subtraction greatly improves the network flexibility.
综上所述,本申请实施例提出的编码方法,可以依赖八叉树结构,使用构建的嵌入网络、编码网络以及解码网络,引入潜变量来捕捉八叉树中同一层中的兄弟节点之间的相关性,能够获得更高的压缩性能;同时,以八叉树的层级为单位进行并行编解码处理,可以有效缩短编解码时间,大大提升编解码效率,进一步提升压缩性能。To sum up, the encoding method proposed in the embodiment of this application can rely on the octree structure, use the built embedding network, encoding network and decoding network, and introduce latent variables to capture the differences between sibling nodes in the same layer in the octree. The correlation can achieve higher compression performance; at the same time, parallel encoding and decoding processing based on the level of the octree can effectively shorten the encoding and decoding time, greatly improve the encoding and decoding efficiency, and further improve the compression performance.
本申请实施例提供了一种编码方法,在编码端,编码器包括第一嵌入网络、编码网络、第一解码网络,编码器将点云依八叉树划分得到的第i层节点的几何信息输入至第一嵌入网络,确定第i层节点的嵌入特征;其中,i大于2;将第i层节点的嵌入特征输入至编码网络,确定第i层节点的潜变量;根据第i层节点的潜变量确定第i层节点的残差的重建值,并根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值;将第i层节点的潜变量的重建值输入至第一解码网络,生成第i层节点的几何信息的码流。由此可见,在本申请的实施例中,在对点云进行编解码时,基于构建的八叉树结构,可以使用潜变量来表征八叉树中的统一层级的节点之间的空间相关性,最终能够获得更高的压缩性能;同时,基于构建的网络对八叉树的相同层级的节点执行并行编解码处理,实现了编解码效率的提高,进一步提升了压缩性能。The embodiment of the present application provides an encoding method. At the encoding end, the encoder includes a first embedding network, a coding network, and a first decoding network. The encoder divides the point cloud according to the octree to obtain the geometric information of the i-th layer node. Input to the first embedding network to determine the embedding features of the i-th layer node; where i is greater than 2; input the embedding features of the i-th layer node to the encoding network to determine the latent variables of the i-th layer node; according to the i-th layer node The latent variable determines the reconstructed value of the residual of the i-th layer node, and determines the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node To the first decoding network, a code stream of geometric information of the i-th layer node is generated. It can be seen that in the embodiments of the present application, when encoding and decoding point clouds, based on the constructed octree structure, latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
本申请的实施例提出一种点云解码方法,该解码方法可以应用于解码器,其中,解码器中可以包括有第二嵌入网络和第二解码网络。The embodiment of the present application proposes a point cloud decoding method, which can be applied to a decoder, where the decoder can include a second embedding network and a second decoding network.
需要说明的是,在本申请的实施例中,解码器可以设置有多层网络,其中,多层网络中的至少一层网络可以包括第二嵌入网络、第二解码网络。It should be noted that in embodiments of the present application, the decoder may be provided with a multi-layer network, wherein at least one layer of the multi-layer network may include a second embedding network and a second decoding network.
示例性的,在本申请的实施例中,解码器可以设置有5层网络,其中,5层网络中的其中一层网络或者多层网络,可以包括有第二嵌入网络、第二解码网络。For example, in the embodiment of the present application, the decoder may be provided with a 5-layer network, wherein one layer of the 5-layer network or a multi-layer network may include a second embedding network and a second decoding network.
图17为本申请实施例提出的解码方法的实现流程示意图一,如图17所示,解码器进行解码处理的方法可以包括以下步骤:Figure 17 is a
步骤201、解码码流,确定点云依八叉树划分得到的第i层节点的残差的重建值;其中, i大于2。Step 201: Decode the code stream and determine the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; where i is greater than 2.
在本申请的实施例中,解码器通过解码码流可以确定点云依八叉树划分得到的第i层节点的残差的重建值。In the embodiment of the present application, the decoder can determine the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree by decoding the code stream.
需要说明的是,在本申请的实施例中,i可以为大于2的整数。It should be noted that, in the embodiment of the present application, i may be an integer greater than 2.
可以理解的是,在本申请的实施例中,多层网络中的第i层网络可以对第i层节点的几何信息进行并行解码处理。It can be understood that, in the embodiment of the present application, the i-th layer network in the multi-layer network can perform parallel decoding processing on the geometric information of the i-th layer nodes.
进一步地,在本申请的实施例中,在对点云进行几何编码处理时,可以先进行八叉树的构建。其中,采用八叉树结构对点云空间进行递归划分,划分成八个相同大小的子块,并判断每个子块的占有码字情况,当子块内不包含点时记为空,否则记为非空,在递归划分的最后一层记录所有块的占有码字信息,并进行几何编码;通过八叉树结构表达的几何信息一方面可以进一步形成几何码流,另一方面可以在进行几何重建处理时,将重建后的几何信息作为附加信息用于属性编码。Further, in the embodiment of the present application, when performing geometric encoding processing on the point cloud, an octree may be constructed first. Among them, the octree structure is used to recursively divide the point cloud space into eight sub-blocks of the same size, and the occupied code words of each sub-block are judged. When the sub-block does not contain points, it is recorded as empty, otherwise it is recorded as empty. is non-empty, the occupied codeword information of all blocks is recorded in the last layer of recursive division, and geometric encoding is performed; on the one hand, the geometric information expressed through the octree structure can further form a geometric code stream, on the other hand, it can be geometrically encoded During the reconstruction process, the reconstructed geometric information is used as additional information for attribute encoding.
相应的,在本申请的实施例中,通过对编码侧传输的几何码流进行解析,可以确定出点云的第i层节点的残差的重建值。Correspondingly, in the embodiment of the present application, by analyzing the geometric code stream transmitted on the encoding side, the reconstruction value of the residual of the i-th layer node of the point cloud can be determined.
可以理解的是,在本申请的实施例中,对于构建的八叉树,可以将父节点层定义为高层,将子节点层定义为低层;也可以将父节点层定义为低层,将子节点层定义为高层。It can be understood that in the embodiment of the present application, for the constructed octree, the parent node layer can be defined as a high-level layer and the child node layer as a low-level layer; the parent node layer can also be defined as a low-level layer and the child node layer can be defined as a low-level layer. A layer is defined as a high-level layer.
其中,编码器进行编码处理的编码顺序可以为按照构建的八叉树的根节点向叶节点的顺序。相应的,在解码侧,解码顺序可以为按照构建的八叉树的叶节点向根节点的顺序。例如,构建的八叉树的根节点层为第i层,叶节点层依次为第i-1至第一层,那么编码处理的顺序为从第i层至第一层,而解码处理的顺序与编码相反,即从第一层至第i层。The encoding sequence in which the encoder performs the encoding process may be in the order from the root node to the leaf node of the constructed octree. Correspondingly, on the decoding side, the decoding order may be in the order from the leaf nodes to the root node of the constructed octree. For example, the root node layer of the constructed octree is the i-th layer, and the leaf node layers are from the i-1 to the first layer in sequence. Then the order of encoding processing is from the i-th layer to the first layer, and the order of decoding processing is The opposite of encoding, that is, from the first layer to the i-th layer.
需要说明的是,在本申请的实施例中,在编码端,可以先根据第i层节点的潜变量的预测值和第i层节点的潜变量,确定第i层节点的残差;然后可以对第i层节点的残差进行量化处理,进而可以确定第i层节点的残差的重建值。It should be noted that in the embodiment of the present application, at the encoding end, the residual of the i-th layer node can be determined first based on the predicted value of the latent variable of the i-th layer node and the latent variable of the i-th layer node; and then The residual of the i-th layer node is quantized, and then the reconstructed value of the residual of the i-th layer node can be determined.
可以理解的是,在本申请的实施例中,编码器可以将第i层节点的残差的重建值写入码流,传输至解码端。也就是说,编码器可以采用残差编码的方式分解并压缩潜变量。It can be understood that in the embodiment of the present application, the encoder can write the reconstructed value of the residual of the i-th layer node into the code stream and transmit it to the decoder. In other words, the encoder can decompose and compress the latent variables using residual coding.
步骤202、根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值。Step 202: Determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node.
在本申请的实施例中,解码器在解码码流确定第i层节点的残差的重建值之后,可以根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值。In the embodiment of the present application, after the decoder determines the reconstructed value of the residual of the i-th layer node after decoding the code stream, it can determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node. .
进一步地,在本申请的实施例中,可以先进行第i层节点的潜变量的预测值的获取。其中,第i层节点的潜变量的预测值可以是由当前层的下一层网络的第二解码网络预测获得的,即可以是由第i-1层网络中的第二解码网络输出第i层节点的潜变量的预测值。Further, in the embodiment of the present application, the predicted value of the latent variable of the i-th layer node can be obtained first. Among them, the predicted value of the latent variable of the i-th layer node can be predicted by the second decoding network of the next layer network of the current layer, that is, it can be the output of the i-th layer by the second decoding network in the i-1 layer network. The predicted value of the latent variable of the layer node.
进一步地,在本申请的实施例中,解码器在确定第i层节点的残差的重建值之后,可以进一步根据第i层节点的残差的重建值和第i层节点的潜变量的预测值,确定第i层节点的潜变量的重建值。其中,第i层节点的潜变量的预测值可以是由第i-1层中的第二解码网络输出的。Further, in the embodiment of the present application, after the decoder determines the reconstruction value of the residual of the i-th layer node, the decoder can further calculate the reconstruction value of the residual of the i-th layer node and the prediction of the latent variable of the i-th layer node. value, determines the reconstructed value of the latent variable of the i-th layer node. Wherein, the predicted value of the latent variable of the i-th layer node may be output by the second decoding network in the i-1th layer.
需要说明的是,在本申请的实施例中,解码器还可以包括加法网络,其中,加法网络可以用于进行潜变量的重建值的确定。It should be noted that in the embodiment of the present application, the decoder may also include an addition network, where the addition network may be used to determine the reconstructed value of the latent variable.
也就是说,在本申请的实施例中,解码器所设置的多层网络中的至少一层网络还可以包括加法网络。其中,该加法网络为软加法网络,可以用于替换常见的硬性的加法处理。That is to say, in the embodiment of the present application, at least one layer of the multi-layer network set up by the decoder may also include an additive network. Among them, the addition network is a soft addition network, which can be used to replace common hard addition processing.
示例性的,在本申请的实施例中,基于加法网络,可以通过上述公式(2)实现潜变量的预测值的确定。For example, in the embodiment of the present application, based on the additive network, the determination of the predicted value of the latent variable can be achieved through the above formula (2).
示例性的,在本申请的实施例中,解码器在确定第i层节点的潜变量的重建值时,可以先将第i层节点的残差的重建值和第i层节点的潜变量的预测值输入至加法网络,进而可以获得第i层节点的潜变量的重建值。For example, in the embodiment of the present application, when the decoder determines the reconstructed value of the latent variable of the i-th layer node, the decoder may first combine the reconstructed value of the residual of the i-th layer node with the reconstructed value of the latent variable of the i-th layer node. The predicted value is input to the additive network, and the reconstructed value of the latent variable of the i-th layer node can be obtained.
需要说明的是,在本申请的实施例中,加法网络可以包括稀疏卷积网络、特征融合网 络Concatenate。It should be noted that in the embodiment of the present application, the additive network may include a sparse convolutional network and a feature fusion network Concatenate.
示例性的,在本申请的实施例中,如图9所示,加法网络的网络结构可以为一个卷积核大小为3,步长为1的稀疏卷积层,后接Concatenate网络。For example, in the embodiment of the present application, as shown in Figure 9, the network structure of the additive network can be a sparse convolution layer with a convolution kernel size of 3 and a step size of 1, followed by a Concatenate network.
示例性的,在本申请的实施例中,如图10所示, 代表第l层节点的残差的重建值, 代表第l层节点的潜变量的重建值, 代表第l层节点的潜变量的预测值。其中, 和 在经过Concatenate网络的融合后,可以通过稀疏卷积层的操作,最终输出第l层节点的潜变量的重建值 Illustratively, in the embodiment of the present application, as shown in Figure 10, Represents the reconstructed value of the residual of the l-th layer node, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the predicted value of the latent variable of the l-th layer node. in, and After the fusion of the Concatenate network, the reconstructed value of the latent variable of the l-th layer node can be finally output through the operation of the sparse convolution layer.
可以理解的是,在本申请的实施例中,使用软加法网络(加法网络)代替原有的硬性加法,大大提升了网络灵活度。It can be understood that, in the embodiments of the present application, a soft addition network (addition network) is used to replace the original hard addition, which greatly improves the network flexibility.
需要说明的是,在本申请的实施例中,第i层节点的潜变量可以表征第i层的兄弟节点之间的相关性。也就是说,在本申请中,解码器可以充分利用层级潜变量捕捉LIDAR点云的相关性。It should be noted that, in the embodiment of the present application, the latent variable of the i-th layer node can represent the correlation between the i-th layer sibling nodes. That is to say, in this application, the decoder can make full use of hierarchical latent variables to capture the correlation of LIDAR point clouds.
步骤203、将第i层节点的潜变量的重建值输入至第二解码网络,确定第i层节点的几何信息。Step 203: Input the reconstructed value of the latent variable of the i-th layer node to the second decoding network to determine the geometric information of the i-th layer node.
在本申请的实施例中,解码器在根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值之后,可以将第i层节点的潜变量的重建值输入至第二解码网络,从而可以确定第i层节点的几何信息。In an embodiment of the present application, after the decoder determines the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node, the decoder may input the reconstructed value of the latent variable of the i-th layer node to the i-th layer node. 2 decoding network, so that the geometric information of the i-th layer node can be determined.
可以理解的是,在本申请的实施例中,点云的第i层节点的几何信息可以表征第i层节点的几何信息,例如节点的位置坐标,也可以表征第i层节点的占有码字情况,即占用情况(占位信息)。It can be understood that in the embodiments of the present application, the geometric information of the i-th layer node of the point cloud can represent the geometric information of the i-th layer node, such as the position coordinate of the node, and can also represent the occupied codeword of the i-th layer node. situation, that is, occupancy status (occupancy information).
也就是说,在本申请的实施例中,第i层节点的几何信息既包括了节点的位置坐标信息,也包括了节点的占位信息。That is to say, in the embodiment of the present application, the geometric information of the i-th layer node includes both the position coordinate information of the node and the occupancy information of the node.
可以理解的是,在本申请的实施例中,在解码端,第二嵌入网络的输入为第i层节点的几何信息,其中,作为第二嵌入网络的输入的节点的几何信息可以同时包括节点的位置坐标信息和节点的占位信息;而第二解码网络输出的几何信息,主要包括节点的占位信息,可以包括节点的位置坐标信息,也可以不包括节点的位置坐标信息。It can be understood that in the embodiment of the present application, at the decoding end, the input of the second embedding network is the geometric information of the i-th layer node, wherein the geometric information of the node as the input of the second embedding network may also include nodes The position coordinate information and the occupancy information of the node; and the geometric information output by the second decoding network mainly includes the occupancy information of the node, which may or may not include the position coordinate information of the node.
由此可见,在本申请的实施例中,嵌入网络需要同时参考节点的占位信息和位置坐标信息,而解码网络则是着重生成占位信息。It can be seen that in the embodiment of the present application, the embedding network needs to refer to the occupancy information and position coordinate information of the node at the same time, while the decoding network focuses on generating the occupancy information.
需要说明的是,在本申请的实施例中,在通过第二解码网络确定第i层节点的几何信息之前,第i-1层网络至第一层网络中的第二嵌入网络已经完成了相应的嵌入特征的输出,同时,第i-1层网络至第一层网络已经完成了相应的潜变量的重建值的输出。It should be noted that in the embodiment of the present application, before the geometric information of the i-th layer node is determined through the second decoding network, the second embedding network in the i-1th layer network to the first layer network has completed the corresponding The output of the embedded feature, at the same time, the i-1th layer network to the first layer network have completed the output of the corresponding reconstruction value of the latent variable.
进一步地,在本申请的实施例中,第i层网络的第二解码网络的输入可以包括:第i-2层网络的第二嵌入网络输出的第i-2层节点的嵌入特征,第i-1层网络的第二嵌入网络输出的第i-1层节点的嵌入特征,第i-1层网络获得的第i-1层节点的潜变量的重建值;第i层网络的第二解码网络的输入还可以包括第i层节点的潜变量的重建值。Further, in the embodiment of the present application, the input of the second decoding network of the i-th layer network may include: the embedded features of the i-2-th layer node output by the second embedding network of the i-2-th layer network, the i-th layer node The embedding feature of the i-1th layer node output by the second embedding network of the -1 layer network, the reconstructed value of the latent variable of the i-1th layer node obtained by the i-1th layer network; the second decoding of the i-th layer network The input of the network can also include the reconstructed value of the latent variable of the i-th layer node.
需要说明的是,在本申请的实施例中,解码器在根据第i层节点的潜变量的重建值进行第i层节点的几何信息的生成时,可以先将第i层节点的潜变量的重建值、第i-2层节点的嵌入特征、第i-1层节点的嵌入特征、第i-1层节点的潜变量的重建值输入至第i层网络的第二解码网络,从而可以生成第i层节点的几何信息。It should be noted that in the embodiment of the present application, when the decoder generates the geometric information of the i-th layer node based on the reconstructed value of the latent variable of the i-th layer node, it may first convert the latent variable of the i-th layer node into The reconstructed value, the embedded feature of the i-2th layer node, the embedded feature of the i-1th layer node, and the reconstructed value of the latent variable of the i-1th layer node are input to the second decoding network of the i-th layer network, thereby generating The geometric information of the i-th layer node.
可以理解的是,在本申请的实施例中,基于第二解码网络,可以先确定第i层节点的概率参数;然后便可以根据第i层的概率参数进一步生成第i层节点的几何信息。It can be understood that in the embodiment of the present application, based on the second decoding network, the probability parameters of the i-th layer nodes can be determined first; then the geometric information of the i-th layer nodes can be further generated according to the probability parameters of the i-th layer.
也就是说,在本申请的实施例中,可以先进行第i-2层节点的嵌入特征、第i-1层节点的嵌入特征、第i-1层节点的潜变量的重建值的获取。其中,第i-2层节点的嵌入特征可以是由当前层的下两层网络的第二嵌入网络获得的,即可以是由第i-2层网络中的第二嵌入网络输出第i-2层节点的嵌入特征;第i-1层节点的嵌入特征可以是由当前层的下一层网络 的第二嵌入网络获得的,即可以是由第i-1层网络中的第二嵌入网络输出第i-1层节点的嵌入特征;第i-1层节点的潜变量的重建值可以是由当前层的下一层的网络获得的。That is to say, in the embodiment of the present application, the embedding features of the i-2th layer node, the embedding features of the i-1th layer node, and the reconstructed values of the latent variables of the i-1th layer node can be obtained first. Among them, the embedded features of the i-2th layer node can be obtained by the second embedding network of the two-layer network below the current layer, that is, the i-2th layer node can be output by the second embedding network in the i-2th layer network. The embedded features of the layer node; the embedded features of the i-1th layer node can be obtained by the second embedding network of the next layer network of the current layer, that is, it can be output by the second embedding network in the i-1th layer network. The embedded features of the i-1th layer node; the reconstructed value of the latent variable of the i-1th layer node can be obtained from the network of the next layer of the current layer.
需要说明的是,在本申请的实施例中,第二解码网络可以包括稀疏卷积网络、反卷积层、初始残差网络、特征融合网络、ReLU激活函数。It should be noted that in the embodiment of the present application, the second decoding network may include a sparse convolution network, a deconvolution layer, an initial residual network, a feature fusion network, and a ReLU activation function.
示例性的,在本申请的实施例中,如图11所示,第二解码网络的网络结构可以为在Concatenate网络后接两层卷积核大小为3,步长为1的稀疏卷积层,后接一个自解码器,如Binary AE。Exemplarily, in the embodiment of the present application, as shown in Figure 11, the network structure of the second decoding network may be a Concatenate network followed by two sparse convolution layers with a convolution kernel size of 3 and a step size of 1. , followed by an autodecoder, such as Binary AE.
示例性的,在本申请的实施例中,如图12所示, 代表第l层节点的潜变量的重建值, 代表第l-1层节点的潜变量的重建值,e (l-1)代表第二嵌入网络输出的第l-1层节点的嵌入特征,e (l-2)代表第二嵌入网络输出的第l-2层节点的嵌入特征。在获得 e (l-1),e (l-2)之后,可以在通过Concatenate网络对 e (l-1),e (l-2)进行拼接之后,通过一个二层的稀疏卷积网络,预测出第l层各节点的概率p (l),最后通过算术编码器,如Binary AE,熵解码生成x (l)。 Illustratively, in the embodiment of the present application, as shown in Figure 12, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the reconstructed value of the latent variable of the l-1 layer node, e (l-1) represents the embedding feature of the l-1 layer node output by the second embedding network, and e (l-2) represents the embedding feature output by the second embedding network. Embedded features of nodes in layer l-2. in getting e (l-1) , e (l-2) , you can use the Concatenate network to After e (l-1) and e (l-2) are spliced, the probability p (l) of each node in the l-th layer is predicted through a two-layer sparse convolution network, and finally through an arithmetic encoder, such as Binary AE , entropy decoding generates x (l) .
进一步地,在本申请的实施例中,解码器在确定出第i层节点的几何信息之后,可以继续将第i层节点的几何信息输入至第二嵌入网络,从而可以确定出第i层节点的嵌入特征。Further, in the embodiment of the present application, after determining the geometric information of the i-th layer node, the decoder can continue to input the geometric information of the i-th layer node to the second embedding network, so that the i-th layer node can be determined embedded features.
进一步地,在本申请的实施例中,第二嵌入网络可以为占用嵌入网络embedding。其中,第二嵌入网络可以用于将八叉树节点从256维的one-hot向量映射至16维的连续的特征空间,使得在空间中相近的占用情况在特征空间中也相近。Further, in the embodiment of the present application, the second embedding network may be an occupied embedding network embedding. Among them, the second embedding network can be used to map the octree nodes from the 256-dimensional one-hot vector to the 16-dimensional continuous feature space, so that similar occupancy conditions in the space are also similar in the feature space.
需要说明的是,在本申请的实施例中,第二嵌入网络可以包括稀疏卷积网络。It should be noted that, in the embodiment of the present application, the second embedding network may include a sparse convolutional network.
示例性的,在本申请的实施例中,第二嵌入网络的网络结构可以由多层稀疏卷积网络构成。例如,如图3所示,第二嵌入网络的网络结构可以为三层的稀疏卷积网络。For example, in the embodiment of the present application, the network structure of the second embedding network may be composed of a multi-layer sparse convolutional network. For example, as shown in Figure 3, the network structure of the second embedding network may be a three-layer sparse convolutional network.
进一步地,在本申请的实施例中,解码器在将点云中的第i层节点的几何信息输入至第二嵌入网络之后,可以输出第i层节点的嵌入特征。其中,第i层节点的嵌入特征可以对第i层节点映射至特征空间后的占用情况进行确定。Further, in embodiments of the present application, after the decoder inputs the geometric information of the i-th layer node in the point cloud to the second embedding network, it can output the embedding feature of the i-th layer node. Among them, the embedded features of the i-th layer node can determine the occupancy status of the i-th layer node after it is mapped to the feature space.
也就是说,在本申请的实施例中,第i层节点的嵌入特征可以对第i层节点的占用情况进行确定。That is to say, in the embodiment of the present application, the embedded characteristics of the i-th layer node can determine the occupancy status of the i-th layer node.
示例性的,在本申请的实施例中,如图4所示,x (l)代表第l层节点的几何信息,e (l)代表第二嵌入网络输出的第l层节点的嵌入特征,对于第l层节点,解码器可以将第l层节点的几何信息x (l)输入至第二嵌入网络中,从而可以通过第二嵌入网络获得对应的第l层节点的嵌入特征e (l)。 Exemplarily, in the embodiment of the present application, as shown in Figure 4, x (l) represents the geometric information of the l-th layer node, e (l) represents the embedding feature of the l-th layer node output by the second embedding network, For the l-th layer node, the decoder can input the geometric information x (l) of the l-th layer node into the second embedding network, so that the corresponding embedding feature e (l) of the l-th layer node can be obtained through the second embedding network. .
进一步地,在本申请的实施例中,图18为本申请实施例提出的解码方法的实现流程示意图二,如图18所示,在根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值之后,即步骤202之后,解码器进行解码处理的方法还可以包括以下步骤:Further, in the embodiment of the present application, Figure 18 is a schematic diagram 2 of the implementation flow of the decoding method proposed in the embodiment of the present application. As shown in Figure 18, the i-th layer is determined based on the reconstructed value of the residual of the i-th layer node. After the reconstructed value of the latent variable of the node, that is, after
步骤204、将第i层节点的潜变量的重建值输入至第二解码网络,确定第i+1层节点的潜变量的预测值。Step 204: input the reconstructed value of the latent variable of the i-th layer node into the second decoding network to determine the predicted value of the latent variable of the i+1-th layer node.
在本申请的实施例中,解码器在根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值之后,可以进一步将第i层节点的潜变量的重建值输入至第二解码网络,从而可以确定第i+1层节点的潜变量的预测值。In the embodiment of the present application, after the decoder determines the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node, the decoder can further input the reconstructed value of the latent variable of the i-th layer node into The second decoding network can determine the predicted value of the latent variable of the i+1th layer node.
需要说明的是,在本申请的实施例中,在通过第二解码网络生成第i+1层节点的潜变量的预测值之前,第i层网络至第一层网络中的第二嵌入网络已经完成了相应的嵌入特征的输出,同时,第i-1层网络至第一层网络已经完成了相应的潜变量的重建值的输出。It should be noted that in the embodiment of the present application, before the second decoding network generates the predicted value of the latent variable of the i+1-th layer node, the second embedding network in the i-th layer network to the first layer network has already The output of the corresponding embedded features is completed. At the same time, the i-1th layer network to the first layer network have completed the output of the corresponding reconstructed values of the latent variables.
进一步地,在本申请的实施例中,第i层网络的第二解码网络的输入可以包括:第i-1层网络的第二嵌入网络输出的第i-1层节点的嵌入特征,第i-1层网络获得的第i-1层节点的潜变量的重建值;第i层网络的第二解码网络的输入还可以包括第i层节点的潜变量 的重建值和第i嵌入特征。Further, in the embodiment of the present application, the input of the second decoding network of the i-th layer network may include: the embedded features of the i-1-th layer node output by the second embedding network of the i-1-th layer network, the i-th layer node The reconstructed value of the latent variable of the i-1th layer node obtained by the -1 layer network; the input of the second decoding network of the i-th layer network may also include the reconstructed value of the latent variable of the i-th layer node and the i-th embedded feature.
需要说明的是,在本申请的实施例中,解码器在根据第i层节点的潜变量的重建值确定第i+1层节点的潜变量的预测值时,可以将第i层节点的潜变量的重建值、第i层节点的嵌入特征、第i-1层节点的嵌入特征、第i-1层节点的潜变量的重建值输入至第i层网络的第二解码网络,从而可以确定第i+1层节点的潜变量的预测值。It should be noted that in the embodiment of the present application, when the decoder determines the predicted value of the latent variable of the i+1-th layer node based on the reconstructed value of the latent variable of the i-th layer node, the decoder may convert the latent variable of the i-th layer node into The reconstructed value of the variable, the embedded feature of the i-th layer node, the embedded feature of the i-1th layer node, and the reconstructed value of the latent variable of the i-1th layer node are input to the second decoding network of the i-th layer network, so that it can be determined The predicted value of the latent variable of the i+1th layer node.
也就是说,在本申请的实施例中,可以先进行述第i-1层节点的嵌入特征、第i-1层节点的潜变量的重建值的获取。其中,第i-1层节点的嵌入特征可以是由当前层的下一层网络的第二嵌入网络获得的,即可以是由第i-1层网络中的第二嵌入网络输出第i-1层节点的嵌入特征;第i-1层节点的潜变量的重建值可以是由当前层的下一层的网络获得的。That is to say, in the embodiment of the present application, the embedding features of the i-1th layer node and the reconstructed values of the latent variables of the i-1th layer node can be obtained first. Among them, the embedded features of the i-1th layer node can be obtained by the second embedding network of the next layer network of the current layer, that is, the i-1th layer output can be obtained by the second embedding network in the i-1th layer network. Embedding features of layer nodes; the reconstructed value of the latent variable of the i-1th layer node can be obtained from the network of the next layer of the current layer.
需要说明的是,在本申请的实施例中,第二解码网络可以包括稀疏卷积网络、反卷积层、初始残差网络、特征融合网络、ReLU激活函数。It should be noted that in the embodiment of the present application, the second decoding network may include a sparse convolution network, a deconvolution layer, an initial residual network, a feature fusion network, and a ReLU activation function.
示例性的,在本申请的实施例中,如图14所示,第二解码网络的网络结构可以为在Concatenate网络后接一层卷积核大小为2,步长为2的稀疏卷积层,后接三个初始残差网络,后接一层卷积核大小为3,步长为1的稀疏卷积层。Exemplarily, in the embodiment of the present application, as shown in Figure 14, the network structure of the second decoding network may be a sparse convolution layer with a convolution kernel size of 2 and a step size of 2 followed by the Concatenate network. , followed by three initial residual networks, followed by a sparse convolution layer with a convolution kernel size of 3 and a stride of 1.
示例性的,在本申请的实施例中,如图15所示, 代表第l层节点的潜变量的重建值, 代表第l-1层节点的潜变量的重建值,e (l)代表第二嵌入网络输出的第l层节点的嵌入特征,e (l-1)代表第二嵌入网络输出的第l-1层节点的嵌入特征。在获得 e (l),e (l-1)之后,可以在通过Concatenate网络对 e (l-1),e (l-2)进行拼接之后,然后通过一个卷积核大小为2,步长为2的反卷积层,再通过初始残差网络得到第l+1层节点的潜变量的预测值 Illustratively, in the embodiment of the present application, as shown in Figure 15, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the reconstructed value of the latent variable of the l-1th layer node, e (l) represents the embedding feature of the l-th layer node output by the second embedding network, e (l-1) represents the l-1 th layer output by the second embedding network Embedding features of layer nodes. in getting e (l) , e (l-1) , you can use the Concatenate network to After e (l-1) and e (l-2) are spliced, a deconvolution layer with a convolution kernel size of 2 and a step size of 2 is passed, and then the l+1th layer node is obtained through the initial residual network The predicted value of the latent variable
由此可见,在本申请的实施例中,对于第二解码网络而言,可以基于不同的输入,分别输出两个不同的分支,一个分支为八叉树第i层节点的几何信息,另一分支为八叉树第i+1层节点的潜变量的预测值。It can be seen that in the embodiment of the present application, for the second decoding network, two different branches can be output based on different inputs. One branch is the geometric information of the i-th layer node of the octree, and the other branch is the geometric information of the i-th layer node of the octree. The branch is the predicted value of the latent variable of the i+1th level node of the octree.
可以理解的是,在本申请的实施例中,八叉树第i+1层节点的潜变量的预测值则可以输入至第i+1层网络中,用于进行第i+1层节点的潜变量的重建值的确定。It can be understood that in the embodiment of the present application, the predicted value of the latent variable of the i+1th layer node of the octree can be input into the i+1th layer network for performing the i+1th layer node calculation. Determination of reconstructed values of latent variables.
进一步,在本申请的实施例中,对于第一层,解码器在利用解码码流获得的第一层节点的残差的重建值确定出第一层节点的潜变量的重建值之后;然后可以将第一层节点的潜变量的重建值和预设嵌入特征输出至第一层网络的第二解码单元,进而可以分别输出第一层节点的几何信息和第二层节点的潜变量的预测值。Further, in an embodiment of the present application, for the first layer, after the decoder determines the reconstructed value of the latent variable of the first layer node using the reconstructed value of the residual of the first layer node obtained by decoding the code stream; then the reconstructed value of the latent variable of the first layer node and the preset embedded feature can be output to the second decoding unit of the first layer network, and then the geometric information of the first layer node and the predicted value of the latent variable of the second layer node can be output respectively.
示例性的,在本申请的实施例中,第一层可以为八叉树中的、全部叶子节点均为空的层级。也就是说,在对点云进行八叉树的构建时,无法继续划分的八叉树的最后一层级即为第一层。For example, in the embodiment of the present application, the first level may be a level in an octree in which all leaf nodes are empty. In other words, when constructing an octree for a point cloud, the last level of the octree that cannot be further divided is the first level.
示例性的,在本申请的实施例中,第一层也可以为八叉树中的、被划分到最小单位的层级,如划分到1x1x1的最小单位块。也就是说,在对点云进行八叉树的构建时,划分到最小单位块的八叉树的最后一层级即为第一层。For example, in the embodiment of the present application, the first level may also be a level in an octree that is divided into minimum units, such as divided into minimum unit blocks of 1x1x1. That is to say, when constructing an octree for a point cloud, the last level of the octree divided into the smallest unit block is the first level.
进一步地,在本申请的实施例中,解码器可以采用因式变分自编码器(factorized variational auto-encoder)风格的熵编解码器,来进行残差r (l)的压缩。 Further, in embodiments of the present application, the decoder may use a factorized variational auto-encoder style entropy codec to compress the residual r (l) .
进一步地,在本申请的实施例中,在解码端,最终接收到的码流可以分别由多层网络中的至少一层网络所对应的码流构成。其中,对于其中的一层网络而言,如第l层网络,对应的码流包括通过因式变分自编码器产生的第l层节点的 的码流,通过算术编码器产生的第l层节点的几何信息x (l)的码流,结构信息。 Further, in the embodiment of the present application, at the decoding end, the finally received code streams may be composed of code streams corresponding to at least one layer of the multi-layer network. Among them, for one layer of the network, such as the l-th layer network, the corresponding code stream includes the l-th layer node generated by the factorial variation autoencoder. The code stream is the code stream of the geometric information x (l) of the l-th layer node generated by the arithmetic encoder, and the structural information.
综上所述,上述步骤201至步骤204所提出的解码方法,可以依赖八叉树结构,使用构建的嵌入网络和解码网络,引入潜变量来捕捉八叉树中同一层中的兄弟节点之间的相关性,能够获得更高的压缩性能;同时,以八叉树的层级为单位进行并行编解码处理,可以有效缩短编解码时间,大大提升编解码效率,进一步提升压缩性能。In summary, the decoding method proposed in
本申请实施例提供了一种解码方法,在解码端,解码器包括第二解码网络,解码器解码码流,确定点云依八叉树划分得到的第i层节点的残差的重建值;其中,i大于2;根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值;将第i层节点的潜变量的重建值输入至第二解码网络,确定第i层节点的几何信息。由此可见,在本申请的实施例中,在对点云进行编解码时,基于构建的八叉树结构,可以使用潜变量来表征八叉树中的统一层级的节点之间的空间相关性,最终能够获得更高的压缩性能;同时,基于构建的网络对八叉树的相同层级的节点执行并行编解码处理,实现了编解码效率的提高,进一步提升了压缩性能。Embodiments of the present application provide a decoding method. At the decoding end, the decoder includes a second decoding network. The decoder decodes the code stream and determines the reconstructed value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; Among them, i is greater than 2; determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node into the second decoding network to determine the i-th layer node. Geometry information of layer nodes. It can be seen that in the embodiments of the present application, when encoding and decoding point clouds, based on the constructed octree structure, latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
基于上述实施例,本申请实施例提出的解码方法,可以理解为一种端到端的LIDAR点云深度熵模型,也可以理解为一种基于深度学习的端到端自监督动态点云压缩技术。其中,依赖于八叉树结构,引入潜变量捕捉八叉树兄弟节点相关性,利用稀疏卷积构建网络,因此取得了LIDAR点云无损熵模型中的最优效果。Based on the above embodiments, the decoding method proposed in the embodiment of this application can be understood as an end-to-end LIDAR point cloud depth entropy model, or as an end-to-end self-supervised dynamic point cloud compression technology based on deep learning. Among them, it relies on the octree structure, introduces latent variables to capture the correlation of octree sibling nodes, and uses sparse convolution to construct the network, thus achieving the optimal effect in the LIDAR point cloud lossless entropy model.
同时,本申请实施例提出的解码方法能够提升LIDAR点云的编解码效率。其中,本申请的实施例具有较低的编解码时间,这是由于本申请的实施例各节点的编解码都是以层为单位并行处理的,且利用高效的稀疏卷积构建网络,因此具有相较于常见的编解码方案低得多的编解码时间。At the same time, the decoding method proposed in the embodiment of this application can improve the encoding and decoding efficiency of LIDAR point cloud. Among them, the embodiments of the present application have a lower encoding and decoding time. This is because the encoding and decoding of each node in the embodiments of the present application are processed in parallel on a layer basis, and efficient sparse convolution is used to construct the network. Therefore, it has Much lower encoding and decoding time compared to common encoding and decoding schemes.
需要说明的是,本申请的实施例提出的执行解码方法的网络,能够充分利用层级潜变量捕捉LIDAR点云相关性,同时可以采用残差编码的方式分解并压缩潜变量,进一步地,还可以使用软加/减网络以代替原有的硬性加减,提升了网络灵活度。It should be noted that the network that performs the decoding method proposed in the embodiment of the present application can make full use of hierarchical latent variables to capture LIDAR point cloud correlation, and can use residual coding to decompose and compress the latent variables. Furthermore, it can also The soft addition/subtraction network is used to replace the original hard addition/subtraction, which improves the flexibility of the network.
进一步地,在申请的实施例中,图19为执行解码方法的整体框架示意图,如图19所示,解码器可以设置有多层网络,其中,多层网络中的至少一层网络可以包括第二嵌入网络(embedding)、第二解码网络(decoder)。例如,每一层均由解码网络(decoder),嵌入网络(embedding)组成。Further, in an embodiment of the application, FIG19 is a schematic diagram of an overall framework for executing a decoding method. As shown in FIG19 , the decoder may be provided with a multi-layer network, wherein at least one layer of the multi-layer network may include a second embedding network (embedding) and a second decoding network (decoder). For example, each layer is composed of a decoding network (decoder) and an embedding network (embedding).
可以理解的是,在本申请的实施例中,多层网络中的第l层网络可以对第l层节点的几何信息进行并行解码处理。It can be understood that, in the embodiments of the present application, the l-th layer network in the multi-layer network can perform parallel decoding processing on the geometric information of the l-th layer nodes.
其中,如图所示,x (l)代表第l层节点的几何信息,e (l)代表占用嵌入网络后的第l层节点的嵌入特征,f (l)代表第l层节点的潜变量, 代表第l层残差的重建值, 代表第l层节点的潜变量的重建值, 代表第l层节点的潜变量的预测值。图中+符号代表软加法网络(加法网络)。 Among them, as shown in the figure, x (l) represents the geometric information of the l-th layer node, e (l) represents the embedded feature of the l-th layer node after occupying the embedded network, and f (l) represents the latent variable of the l-th layer node. , Represents the reconstructed value of the l-th layer residual, Represents the reconstructed value of the latent variable of the l-th layer node, Represents the predicted value of the latent variable of the l-th layer node. The + symbol in the figure represents a soft additive network (additive network).
示例性的,在本申请的实施例中,如图所示,解码器中设置的网络可以由5层网络构成。其中,对于第l层网络,第l层节点的几何信息x (l)通过嵌入网络,得到占用嵌入e (l)(嵌入特征)。解码网络分为两路,分别生成八叉树的当前层x (l),以及上一层节点的潜变量的预测值 For example, in the embodiment of the present application, as shown in the figure, the network provided in the decoder may be composed of a 5-layer network. Among them, for the l-th layer network, the geometric information x (l) of the l-th layer node passes through the embedding network to obtain the occupancy embedding e (l) (embedded feature). The decoding network is divided into two paths, which generate the current layer x (l) of the octree and the predicted value of the latent variable of the node in the previous layer.
需要说明的是,在本申请的实施例中,对于第一层(如图所示的第l-5层),解码器可以先确定第一层节点的潜变量的重建值 然后可以将第一层节点的潜变量的重建值和预设嵌入特征e (l-6)输出至第一层网络的第二解码单元,进而可以分别输出第一层节点的几何信息和第二层节点的潜变量的预测值 It should be noted that in the embodiment of the present application, for the first layer (layer 1-5 as shown in the figure), the decoder can first determine the reconstructed value of the latent variable of the node in the first layer. Then the reconstructed value of the latent variable of the first layer node and the preset embedded feature e (l-6) can be output to the second decoding unit of the first layer network, and then the geometric information of the first layer node and the second decoding unit can be output respectively. The predicted value of the latent variable of the layer node
进一步地,在本申请的实施例中,如图所示,第l-1层网络的第二解码网络的输入可以包括:第l-3层网络的第二嵌入网络输出的第l-3层节点的嵌入特征,第l-2层网络的第二嵌入网络输出的第l-2层节点的嵌入特征,第l-2层网络获得的第l-2层节点的潜变量的重建值;第l-1层网络的第二解码网络的输入还可以包括第l-1层节点的潜变量的重建值。相应的,解码器在进行第l-1层节点的几何信息的生成时,可以先将第l-1层节点的潜变量的重建值、第l-3层节点的嵌入特征、第l-2层节点的嵌入特征、第l-2层节点的潜变量的重建值输入至第l-1层网络的第二解码网络,从而可以生成第l-1层节点的几何信息。Further, in an embodiment of the present application, as shown in the figure, the input of the second decoding network of the l-1 layer network may include: the embedded features of the l-3 layer nodes output by the second embedding network of the l-3 layer network, the embedded features of the l-2 layer nodes output by the second embedding network of the l-2 layer network, and the reconstructed values of the latent variables of the l-2 layer nodes obtained by the l-2 layer network; the input of the second decoding network of the l-1 layer network may also include the reconstructed values of the latent variables of the l-1 layer nodes. Accordingly, when the decoder generates the geometric information of the l-1 layer nodes, it may first input the reconstructed values of the latent variables of the l-1 layer nodes, the embedded features of the l-3 layer nodes, the embedded features of the l-2 layer nodes, and the reconstructed values of the latent variables of the l-2 layer nodes into the second decoding network of the l-1 layer network, thereby generating the geometric information of the l-1 layer nodes.
需要说明的是,在本申请的实施例中,解码器在进行第l层节点的潜变量的预测值的确定时,可以将第l-1层节点的潜变量的重建值、第l-1层节点的嵌入特征、第l-2层节点的嵌入特征、第l-2层节点的潜变量的重建值输入至第l-1层网络的第二解码网络,从而可以确定第l层节点的潜变量的预测值。It should be noted that in the embodiment of the present application, when the decoder determines the predicted value of the latent variable of the l-th layer node, the reconstructed value of the latent variable of the l-1th layer node, the l-1th layer node The embedded features of the layer node, the embedded features of the l-2th layer node, and the reconstructed value of the latent variable of the l-2th layer node are input to the second decoding network of the l-1th layer network, so that the l-th layer node can be determined. Predictive value of the latent variable.
进一步地,在本申请的实施例中,如图所示,在第l-1层网络的第二解码网络输出第l层节点的潜变量的预测值之后,可以在后续的第l层节点的解码处理过程中,使用第l层节点的潜变量的预测值,利用加法网络,进行第l层节点的潜变量的重建值的确定。Further, in the embodiment of the present application, as shown in the figure, after the second decoding network of the l-1th layer network outputs the predicted value of the latent variable of the l-th layer node, the subsequent l-th layer node can be During the decoding process, the predicted value of the latent variable of the l-th layer node is used, and the additive network is used to determine the reconstructed value of the latent variable of the l-th layer node.
示例性的,在本申请的实施例中,解码网络(decoder)分别用于生成八叉树的当前层x (l),以及上一层节点的潜变量的预测值 其中,一个分支包括:在获得 e (l-1),e (l-2)之后,可以在通过Concatenate网络对 e (l-1),e (l-2)进行拼接之后,通过一个二层的稀疏卷积网络,预测出第l层各节点的概率p (l),最后通过算术编码器,熵解码生成x (l)。另一个分支包括:在获得 e (l),e (l-1)之后,可以在通过Concatenate网络对 e (l-1),e (l-2)进行拼接之后,然后通过一个卷积核大小为2,步长为2的反卷积层,再通过初始残差网络得到第l+1层节点的潜变量的预测值 Exemplarily, in an embodiment of the present application, the decoding network (decoder) is used to generate the predicted value of the latent variable of the current layer x (l) of the octree and the previous layer node. One branch includes: obtaining e (l-1) , e (l-2) can be connected through the Concatenate network After e (l-1) and e (l-2) are concatenated, a two-layer sparse convolutional network is used to predict the probability p (l) of each node in the lth layer, and finally an arithmetic encoder is used to entropy decode and generate x (l) . Another branch includes: e (l) , e (l-1) can be used in the Concatenate network. After e (l-1) and e (l-2) are concatenated, they are passed through a deconvolution layer with a convolution kernel size of 2 and a step size of 2, and then the predicted value of the latent variable of the l+1th layer node is obtained through the initial residual network.
示例性的,在本申请的实施例中,嵌入网络(embedding)可以用于将八叉树节点从256维的one-hot向量映射至16维的连续的特征空间,使得在空间中相近的占用情况在特征空间中也相近。For example, in the embodiment of the present application, the embedding network (embedding) can be used to map the octree nodes from the 256-dimensional one-hot vector to the 16-dimensional continuous feature space, so that they occupy similar spaces in the space. The situation is similar in feature space.
示例性的,在本申请的实施例中,解码器所设置的多层网络中的至少一层网络还可以包括加法网络。其中,使用软加法网络(加法网络)代替原有的硬性加法,大大提升了网络灵活度。For example, in the embodiment of the present application, at least one layer of the multi-layer network set up by the decoder may also include an additive network. Among them, the soft addition network (additive network) is used to replace the original hard addition, which greatly improves the flexibility of the network.
综上所述,本申请实施例提出的解码方法,可以依赖八叉树结构,使用构建的嵌入网络、编码网络以及解码网络,引入潜变量来捕捉八叉树中同一层中的兄弟节点之间的相关性,能够获得更高的压缩性能;同时,以八叉树的层级为单位进行并行编解码处理,可以有效缩短编解码时间,大大提升编解码效率,进一步提升压缩性能。To sum up, the decoding method proposed in the embodiment of this application can rely on the octree structure, use the built embedding network, encoding network and decoding network, and introduce latent variables to capture the differences between sibling nodes in the same layer in the octree The correlation can achieve higher compression performance; at the same time, parallel encoding and decoding processing based on the level of the octree can effectively shorten the encoding and decoding time, greatly improve the encoding and decoding efficiency, and further improve the compression performance.
本申请实施例提供了一种解码方法,在解码端,解码器包括第二解码网络,解码器解码码流,确定点云依八叉树划分得到的第i层节点的残差的重建值;其中,i大于2;根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值;将第i层节点的潜变量的重建值输入至第二解码网络,确定第i层节点的几何信息。由此可见,在本申请的实施例中,在对点云进行编解码时,基于构建的八叉树结构,可以使用潜变量来表征八叉树中的统一层级的节点之间的空间相关性,最终能够获得更高的压缩性能;同时,基于构建的网络对八叉树的相同层级的节点执行并行编解码处理,实现了编解码效率的提高,进一步提升了压缩性能。Embodiments of the present application provide a decoding method. At the decoding end, the decoder includes a second decoding network. The decoder decodes the code stream and determines the reconstructed value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; Among them, i is greater than 2; determine the reconstructed value of the latent variable of the i-th layer node based on the reconstructed value of the residual of the i-th layer node; input the reconstructed value of the latent variable of the i-th layer node into the second decoding network to determine the i-th layer node. Geometry information of layer nodes. It can be seen that in the embodiments of the present application, when encoding and decoding point clouds, based on the constructed octree structure, latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
基于上述实施例,在本申请的再一实施例中,基于前述实施例相同的发明构思,图20为编码器的组成结构示意图一,如图20所示,编码器10可以包括:第一确定单元11,编码单元12,生成单元13;其中,Based on the above embodiments, in yet another embodiment of the present application, based on the same inventive concept as in the previous embodiments, Figure 20 is a schematic structural diagram of an encoder. As shown in Figure 20, the
所述第一确定单元11,配置为将点云依八叉树划分得到的第i层节点的几何信息输入至第一嵌入网络,确定所述第i层节点的嵌入特征;其中,i大于2;将所述第i层节点的嵌入特征输入至编码网络,确定所述第i层节点的潜变量;根据所述第i层节点的潜变量确定所述第i层节点的残差的重建值,并根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;The
所述编码单元12,配置为将所述第i层节点的残差的重建值写入码流;The
所述生成单元13,配置为将所述第i层节点的潜变量的重建值输入至第一解码网络,生成所述第i层节点的几何信息的码流。The generating
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集 成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。It can be understood that in this embodiment, the "unit" may be part of a circuit, part of a processor, part of a program or software, etc., and of course may also be a module, or may be non-modular. Moreover, each component in this embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software function modules.
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially either The part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes a number of instructions to make a computer device (can It is a personal computer, server, or network device, etc.) or processor that executes all or part of the steps of the method described in this embodiment. The aforementioned storage media include: U disk, mobile hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk or optical disk and other media that can store program code.
因此,本申请实施例提供了一种计算机可读存储介质,应用于编码器10,该计算机可读存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现前述实施例中任一项所述的方法。Therefore, embodiments of the present application provide a computer-readable storage medium for use in the
基于上述编码器10的组成以及计算机可读存储介质,图21为编码器的组成结构示意图二,如图21所示,编码器10可以包括:第一存储器14和第一处理器15,第一通信接口16和第一总线系统17。第一存储器14、第一处理器15、第一通信接口16通过第一总线系统17耦合在一起。可理解,第一总线系统17用于实现这些组件之间的连接通信。第一总线系统17除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图9中将各种总线都标为第一总线系统17。其中,Based on the composition of the above-mentioned
第一通信接口16,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;The
所述第一存储器14,用于存储能够在所述第一处理器上运行的计算机程序;The
所述第一处理器15,用于在运行所述计算机程序时,将点云依八叉树划分得到的第i层节点的几何信息输入至所述第一嵌入网络,确定所述第i层节点的嵌入特征;其中,i大于2;将所述第i层节点的嵌入特征输入至所述编码网络,确定所述第i层节点的潜变量;根据所述第i层节点的潜变量确定所述第i层节点的残差的重建值,并根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;将所述第i层节点的潜变量的重建值输入至所述第一解码网络,生成所述第i层节点的几何信息的码流。The
可以理解,本申请实施例中的第一存储器14可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的系统和方法的第一存储器14旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the
而第一处理器15可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过第一处理器15中的硬件的集成逻辑电路或者软件形式的指令完成。上述的第一处理器15可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列 (Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于第一存储器14,第一处理器15读取第一存储器14中的信息,结合其硬件完成上述方法的步骤。The
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。It will be understood that the embodiments described in this application can be implemented using hardware, software, firmware, middleware, microcode, or a combination thereof. For hardware implementation, the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (FPGA), general-purpose processor, controller, microcontroller, microprocessor, and other devices used to perform the functions described in this application electronic unit or combination thereof. For software implementation, the technology described in this application can be implemented through modules (such as procedures, functions, etc.) that perform the functions described in this application. Software code may be stored in memory and executed by a processor. The memory can be implemented in the processor or external to the processor.
可选地,作为另一个实施例,第一处理器15还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。Optionally, as another embodiment, the
图22为解码器的组成结构示意图一,如图22所示,解码器20可以包括:解码单元21,第二确定单元,22;其中,FIG. 22 is a schematic diagram of a structure of a decoder. As shown in FIG. 22 , the
所述解码单元21,配置为解码码流;The
所述第二确定单元22,配置为确定点云依八叉树划分得到的第i层节点的残差的重建值;其中,i大于2;根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;将所述第i层节点的潜变量的重建值输入至第二解码网络,确定所述第i层节点的几何信息。The
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。It can be understood that in this embodiment, a "unit" can be a part of a circuit, a part of a processor, a part of a program or software, etc., and of course it can also be a module, or it can be non-modular. Moreover, the components in this embodiment can be integrated into a processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of a software functional module.
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially either The part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes a number of instructions to make a computer device (can It is a personal computer, server, or network device, etc.) or processor that executes all or part of the steps of the method described in this embodiment. The aforementioned storage media include: U disk, mobile hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk or optical disk and other media that can store program code.
因此,本申请实施例提供了一种计算机可读存储介质,应用于解码器20,该计算机可读存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现前述实施例中任一项所述的方法。Therefore, embodiments of the present application provide a computer-readable storage medium for use in the
基于上述解码器20的组成以及计算机可读存储介质,图23为解码器的组成结构示意图二,如图23所示,解码器20可以包括:第二存储器23和第二处理器24,第二通信接口25和第二总线系统26。第二存储器23和第二处理器24,第二通信接口25通过第二总 线系统26耦合在一起。可理解,第二总线系统26用于实现这些组件之间的连接通信。第二总线系统26除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图11中将各种总线都标为第二总线系统26。其中,Based on the composition of the
第二通信接口25,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;The second communication interface 25 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
所述第二存储器23,用于存储能够在所述第二处理器上运行的计算机程序;The
所述第二处理器24,用于在运行所述计算机程序时,解码码流,确定点云依八叉树划分得到的第i层节点的残差的重建值;其中,i大于2;根据所述第i层节点的残差的重建值确定所述第i层节点的潜变量的重建值;将所述第i层节点的潜变量的重建值输入至所述第二解码网络,确定所述第i层节点的几何信息。The
可以理解,本申请实施例中的第二存储器23可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的系统和方法的第二存储器23旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the
而第二处理器24可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过第二处理器24中的硬件的集成逻辑电路或者软件形式的指令完成。上述的第二处理器24可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于第二存储器23,第二处理器24读取第二存储器23中的信息,结合其硬件完成上述方法的步骤。The
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。It will be understood that the embodiments described in this application can be implemented using hardware, software, firmware, middleware, microcode, or a combination thereof. For hardware implementation, the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (FPGA), general-purpose processor, controller, microcontroller, microprocessor, and other devices used to perform the functions described in this application electronic unit or combination thereof. For software implementation, the technology described in this application can be implemented through modules (such as procedures, functions, etc.) that perform the functions described in this application. Software code may be stored in memory and executed by a processor. The memory can be implemented in the processor or external to the processor.
本申请实施例提供了一种编码器和解码器,在对点云进行编解码时,基于构建的八叉树结构,可以使用潜变量来表征八叉树中的统一层级的节点之间的空间相关性,最终能够 获得更高的压缩性能;同时,基于构建的网络对八叉树的相同层级的节点执行并行编解码处理,实现了编解码效率的提高,进一步提升了压缩性能。Embodiments of the present application provide an encoder and a decoder. When encoding and decoding point clouds, based on the constructed octree structure, latent variables can be used to represent the space between nodes at a unified level in the octree. Correlation can ultimately achieve higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding is performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
需要说明的是,在本申请的实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that in the embodiments of the present application, the terms "comprising", "comprising" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only Includes those elements and also includes other elements not expressly listed or inherent in such process, method, article or apparatus. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article or apparatus that includes that element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The above serial numbers of the embodiments of the present application are only for description and do not represent the advantages and disadvantages of the embodiments.
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。The methods disclosed in several method embodiments provided in this application can be combined arbitrarily to obtain new method embodiments without conflict.
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。The features disclosed in several product embodiments provided in this application can be combined arbitrarily without conflict to obtain new product embodiments.
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。The features disclosed in several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments or device embodiments.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the present technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.
本申请实施例提供了一种编解码方法、编码器、解码器以及存储介质,在编码端,编码器包括第一嵌入网络、编码网络、第一解码网络,编码器将点云依八叉树划分得到的第i层节点的几何信息输入至第一嵌入网络,确定第i层节点的嵌入特征;其中,i大于2;将第i层节点的嵌入特征输入至编码网络,确定第i层节点的潜变量;根据第i层节点的潜变量确定第i层节点的残差的重建值,并根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值;将第i层节点的潜变量的重建值输入至第一解码网络,生成第i层节点的几何信息的码流。在解码端,解码器包括第二解码网络,解码器解码码流,确定点云依八叉树划分得到的第i层节点的残差的重建值;其中,i大于2;根据第i层节点的残差的重建值确定第i层节点的潜变量的重建值;将第i层节点的潜变量的重建值输入至第二解码网络,确定第i层节点的几何信息。由此可见,在本申请的实施例中,在对点云进行编解码时,基于构建的八叉树结构,可以使用潜变量来表征八叉树中的统一层级的节点之间的空间相关性,最终能够获得更高的压缩性能;同时,基于构建的网络对八叉树的相同层级的节点执行并行编解码处理,实现了编解码效率的提高,进一步提升了压缩性能。Embodiments of the present application provide an encoding and decoding method, an encoder, a decoder, and a storage medium. On the encoding side, the encoder includes a first embedding network, an encoding network, and a first decoding network. The encoder converts point clouds into octrees. The geometric information of the i-th layer node obtained by the division is input to the first embedding network to determine the embedding characteristics of the i-th layer node; where i is greater than 2; the embedding characteristics of the i-th layer node are input to the encoding network to determine the i-th layer node of the latent variable; determine the reconstruction value of the residual of the i-th layer node based on the latent variable of the i-th layer node, and determine the reconstruction value of the latent variable of the i-th layer node based on the reconstruction value of the residual of the i-th layer node; The reconstructed value of the latent variable of the i-layer node is input to the first decoding network to generate a code stream of the geometric information of the i-th layer node. At the decoding end, the decoder includes a second decoding network. The decoder decodes the code stream and determines the reconstruction value of the residual of the i-th layer node obtained by dividing the point cloud according to the octree; where i is greater than 2; according to the i-th layer node The reconstructed value of the residual determines the reconstructed value of the latent variable of the i-th layer node; the reconstructed value of the latent variable of the i-th layer node is input to the second decoding network to determine the geometric information of the i-th layer node. It can be seen that in the embodiments of the present application, when encoding and decoding point clouds, based on the constructed octree structure, latent variables can be used to represent the spatial correlation between nodes at a unified level in the octree. , ultimately achieving higher compression performance; at the same time, based on the constructed network, parallel encoding and decoding processes are performed on nodes at the same level of the octree, which improves encoding and decoding efficiency and further improves compression performance.
Claims (37)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280097263.1A CN119384681A (en) | 2022-09-22 | 2022-09-22 | Coding and decoding method, encoder, decoder and storage medium |
PCT/CN2022/120683 WO2024060161A1 (en) | 2022-09-22 | 2022-09-22 | Encoding method, decoding method, encoder, decoder and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/120683 WO2024060161A1 (en) | 2022-09-22 | 2022-09-22 | Encoding method, decoding method, encoder, decoder and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024060161A1 true WO2024060161A1 (en) | 2024-03-28 |
Family
ID=90453546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/120683 WO2024060161A1 (en) | 2022-09-22 | 2022-09-22 | Encoding method, decoding method, encoder, decoder and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN119384681A (en) |
WO (1) | WO2024060161A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119722825A (en) * | 2025-02-27 | 2025-03-28 | 烟台大学 | A point cloud geometry compression method and system based on inter-layer residual and IRN connection residual |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200401900A1 (en) * | 2019-06-18 | 2020-12-24 | Samsung Electronics Co., Ltd. | Apparatus for performing class incremental learning and method of operating the apparatus |
CN113766228A (en) * | 2020-06-05 | 2021-12-07 | Oppo广东移动通信有限公司 | Point cloud compression method, encoder, decoder and storage medium |
CN114092631A (en) * | 2020-08-24 | 2022-02-25 | 鹏城实验室 | Point cloud attribute encoding method and decoding method based on weighted 3D Haar transform |
US20220101145A1 (en) * | 2020-09-25 | 2022-03-31 | Nvidia Corporation | Training energy-based variational autoencoders |
WO2022166957A1 (en) * | 2021-02-08 | 2022-08-11 | 荣耀终端有限公司 | Point cloud data preprocessing method, point cloud geometry coding method and device, and point cloud geometry decoding method and device |
US20220272345A1 (en) * | 2020-10-23 | 2022-08-25 | Deep Render Ltd | Image encoding and decoding, video encoding and decoding: methods, systems and training methods |
-
2022
- 2022-09-22 WO PCT/CN2022/120683 patent/WO2024060161A1/en unknown
- 2022-09-22 CN CN202280097263.1A patent/CN119384681A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200401900A1 (en) * | 2019-06-18 | 2020-12-24 | Samsung Electronics Co., Ltd. | Apparatus for performing class incremental learning and method of operating the apparatus |
CN113766228A (en) * | 2020-06-05 | 2021-12-07 | Oppo广东移动通信有限公司 | Point cloud compression method, encoder, decoder and storage medium |
CN114092631A (en) * | 2020-08-24 | 2022-02-25 | 鹏城实验室 | Point cloud attribute encoding method and decoding method based on weighted 3D Haar transform |
US20220101145A1 (en) * | 2020-09-25 | 2022-03-31 | Nvidia Corporation | Training energy-based variational autoencoders |
US20220272345A1 (en) * | 2020-10-23 | 2022-08-25 | Deep Render Ltd | Image encoding and decoding, video encoding and decoding: methods, systems and training methods |
WO2022166957A1 (en) * | 2021-02-08 | 2022-08-11 | 荣耀终端有限公司 | Point cloud data preprocessing method, point cloud geometry coding method and device, and point cloud geometry decoding method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119722825A (en) * | 2025-02-27 | 2025-03-28 | 烟台大学 | A point cloud geometry compression method and system based on inter-layer residual and IRN connection residual |
Also Published As
Publication number | Publication date |
---|---|
CN119384681A (en) | 2025-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117097898A (en) | Decoding and encoding method based on point cloud attribute prediction, decoder and encoder | |
WO2022121648A1 (en) | Point cloud data encoding method, point cloud data decoding method, device, medium, and program product | |
CN118614061A (en) | Coding and decoding method, encoder, decoder and storage medium | |
WO2023103565A1 (en) | Point cloud attribute information encoding and decoding method and apparatus, device, and storage medium | |
TW202404359A (en) | Codec methods, encoders, decoders and readable storage media | |
CN115474041A (en) | Point cloud attribute prediction method and device and related equipment | |
KR20240064698A (en) | Feature map encoding and decoding method and device | |
WO2024060161A1 (en) | Encoding method, decoding method, encoder, decoder and storage medium | |
CN117321991A (en) | Prediction method, device and codec for point cloud attributes | |
CN115086716A (en) | Method and device for selecting neighbor points in point cloud and coder/decoder | |
WO2024174086A1 (en) | Decoding method, encoding method, decoders and encoders | |
WO2024148473A1 (en) | Encoding method and apparatus, encoder, code stream, device, and storage medium | |
TW202408231A (en) | Encoding method, decoding method, code stream, encoder, decoder and storage medium | |
WO2024221458A1 (en) | Point cloud encoding/decoding method and apparatus, device, and storage medium | |
WO2024178542A1 (en) | Coding method, decoding method, encoders, decoders and storage medium | |
CN116325732A (en) | Decoding and encoding method, decoder, encoder and encoding and decoding system of point cloud | |
WO2024178565A1 (en) | Decoding method, decoder and storage medium | |
WO2024159534A1 (en) | Encoding method, decoding method, bitstream, encoder, decoder and storage medium | |
WO2025076659A1 (en) | Point cloud encoding method, point cloud decoding method, code stream, encoder, decoder and storage medium | |
WO2023173238A1 (en) | Encoding method, decoding method, code stream, encoder, decoder, and storage medium | |
WO2025076662A1 (en) | Point cloud encoding method, point cloud decoding method, code stream, encoder, decoder, and storage medium | |
HK40084295A (en) | Point cloud encoding and decoding method, apparatus, device, and storage medium | |
WO2025039236A1 (en) | Coding and decoding method, code stream, encoder, decoder, and storage medium | |
WO2024007144A1 (en) | Encoding method, decoding method, code stream, encoders, decoders and storage medium | |
WO2025145442A1 (en) | Coding method, decoding method, coders, decoders, code stream and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22959169 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |